Check data pools for changes or manipulation

Slashdot it! Delicious Share on Facebook Tweet! Digg!
kerdkanno, 123RF

kerdkanno, 123RF

Bean Counting

… rsync, integrit, aide – all these tools monitor the system's directory tree and issue an alarm as soon as they detect unauthorized changes.

The data pool on storage media doesn't last forever. There is a risk of loss due to natural aging, defects of the data storage but also by mistakes or even intrusions into the system. Therefore, it is part of each responsible system administrator's tasks to audit if the data is intact and whether there have been changes.

In order to prevent write access to a storage medium or directory you can use a write-only medium like a DVD or activate a write protection, for example when using a SD card. Experienced users often mount selected directories as read-only or set their write protection to on (see the "Mount Filesystems as Read-Only box).

Mount Filesystems as Read-Only

The file /etc/fstab lists all partitions the system mounts into the directory tree. In the fourth column of each row, you can define with ro (read-only) or rw (read-write) whether one can only read or also write on to the partition (Listing 1).

Another option is to use the immutable flag [1]. This flag is part of a directory entry's advanced attributes, which only a few Linux users know and even fewer use in practice. If File-based Access Control Lists (FACLs) come into play on top of that, for example in the context of SELinux [2] in distributions like Red Hat, Fedora, or CentOS, one can define access rights even more precisely.

With the immutable flag set, the directory entry cannot be changed any more – the file or the directory is write protected. Each attempt to modify data is denied by the operating system. Only the root user can set and remove this flag for single users. You can do the former by using chattr +i <file> , or the latter with chattr -i <file> . The i in the fourth row of Listing 2 shows that the file example.txt carries the immutable flag.

Listing 1

fstab

$ grep data /etc/fstab
/dev/sdb1  /data  ext4  ro  0  0

Listing 2

Attributes of example.txt

# touch example.txt
# chattr +i example.txt
# lsattr example.txt
----i--------e-- example.txt
# echo "# Comment" >> example.txt
bash: file: Operation not permitted
# chattr -i example.txt

A change in the data pool may happen with regard to either its content by additions and deletions or also data-access privileges. Possible modifications also include adding, renaming, moving and deleting of files, directories, and (symbolic) links. Your concern as a system administrator is to understand at which point in time which modifications happened, which user executed them, and – in case of errors – how you can repair things.

Detecting modifications beforehand including an appropriate reaction to such an incident goes beyond the scope of this article, so I will focus on how you can detect such changes retroactively, after they happen. In the examples below, you'll see the steps to follow with rsync and integrit .

In principle, the procedures can be used with other tools as well (e.g., Tripwire , Aide , and Iwatch ). However the configuration and evaluation of the results will differ one from another.

Rsync

If you have two data pools – for example an original and a backup – you can already make use of the default tool rsync [3] to detect differences between both datasets. Rsync was originally designed to be used to synchronize two directories, and it echoes to the terminal which entries differ from each other.

Listing 3 demonstrates this with the two directories original/ and copy/ . Each contains three initially identical files. But in the copy/ directory, I have modified data. While alright.txt remains unchanged, I set the execution bit for the group for anything.txt and added additional content to somewhat.txt .

Listing 3

Using rsync to find changes

$ ls -la {original,copy}
copy:
total 16
drwxr-xr-x 2 frank frank 4096 Jun  1 14:28 .
drwxr-xr-x 4 frank frank 4096 Jun  1 14:25 ..
-rw-r-xr-- 1 frank frank   15 Jun  1 16:36 alright.txt
-rw-r-xr-- 1 frank frank   10 Jun  1 14:28 anything.txt
-rw-r--r-- 1 frank frank   24 Jun  1 14:30 somewhat.txt
original:
total 16
drwxr-xr-x 2 frank frank 4096 Jun  1 14:26 .
drwxr-xr-x 4 frank frank 4096 Jun  1 14:25 ..
-rw-r-xr-- 1 frank frank   15 Jun  1 16:36 alright.txt
-rw-r--r-- 1 frank frank   10 Jun  1 14:26 anything.txt
-rw-r--r-- 1 frank frank   10 Jun  1 14:26 somewhat.txt
$ rsync -anv --out-format="[%t]:%o:%f:Last Modified %M" copy/* original
sending incremental file list
[2016/06/01 16:40:25]:send:copy/alright.txt:Last Modified 2016/06/01-16:36:14
[2016/06/01 16:40:25]:send:copy/anything.txt:Last Modified 2016/06/01-14:28:49
[2016/06/01 16:40:25]:send:copy/somewhat.txt:Last Modified 2016/06/01-14:30:23
sent 137 bytes  received 25 bytes  324.00 bytes/sec
total size is 34  speedup is 0.21 (DRY RUN)

Rsync allows a dry run using the -n switch (long option --dry-run ). Here you use this mode of operation to detect modification without actually starting a synchronization of both directories. Rsync compares both folders by using -a (long option --archive ) taking into account the names of the existing entries, their size, and the set access permissions.

Without any additional options, rsync behaves a little bit tight-lipped. Only when using -v (long version --verbose ) does it show details of the transactions taken place. The option -v can be set multiple times where necessary to increase the amount of details. The additional switch --out-format defines how rsync comments the details about the data transaction.

In my example, %t prints the transfer's timestamp, %o the action to be executed (send or receive), %f the file name, and %M the timestamp of the last modification (see Table 1 [4]). Additional help for rsync can be obtained from an introductory article [5], as well as the rsync man page.

Table 1

Rsync Format Placeholder

Placeholder Meaning
%a Remote IP address
%b Number of bytes actually transferred
%B Permission bits of the file (e.g., rwxrwxrwt )
%c Total size of the block checksums received for the basis file (only when sending)
%f File name (long form on sender; no trailing "/")
%G GID of the file (decimal) or DEFAULT
%h Remote hostname
%i Itemized list of what is being updated
%l Length of the file in bytes
%L String -> SYMLINK , => HARDLINK , or empty
%m Module name
%M Last-modified time of the file
%n Filename (short form; trailing "/" on dir)
%o Operation (send , recv , or del )
%p PID of the rsync session
%P Module path
%t Current date and time
%u Authenticated username or an empty string
%U UID of the file (decimal)

Because you only care about the modified entries, you can also make use of the combination of rsync -i (long form --info ) and the filter tool grep . From the detailed but still compact output of rsync, you filter out only information that contains modifications. All other lines are dropped.

The output contains one line per file, each of which is preceded by a > . The following 10 characters represent the properties rsync uses to compare the two entries. If there is a dot in any of the positions, there is no difference between the files regarding that property. If there are letters, there is a modification. For example, c stands for checksum, meaning the files' checksums or hash values are different; s indicates different sizes; and p indicates different permissions.

You filter the relevant lines from the output by using grep and an appropriate regular expression. The expression used in Listing 4 matches sequences that start with an f , followed by an arbitrary character, which is followed by either a dot and tp , st and a dot, or three dots. The last two lines contain the matches.

Listing 4

Comparing checksums

$ rsync -acniv copy/* original | grep --color -E "f.(\.tp|st\.|\.\.\.)"
>f..tp..... anything.txt
>fcst...... somewhat.txt

The -c switch in the rsync call is a peculiar case: It makes the program compare the files not only by their size but also calculates a checksum in the form of a hash value (see the "Hash Values" box). In doing so, you can also trace the modifications made to contents that do not change the size and where the timestamp has been set back to the original date afterwards.

Hash Values

Hash functions belong to the cryptographic methods. They can be used to calculate checksums. With Linux, you can use the tools md5sum (MD5 with 128 bits), sha1sum (SHA1 with 160 bits), sha224sum (SHA2 with 224 bits), sha256sum (SHA2 wth 256 bits), sha384sum (SHA2 with 384 bits), and sha512sum (SHA2 with 512 bits). The numerical sequence usually describes the length of the resulting hash value in bits whereby MD5 and SHA1 mark an exception. If your system isn't equipped with any of the listed applications, you can use opensslc , which also calculates hash values.

Content Parity

In order to check quickly whether two files have the same content, the Linux tools cmp , comm , diff , and sdiff can only partly help. They work line-by-line, byte-by-byte, or block-by-block and are excruciatingly slow in some cases. Instead, the shell script from Listing 5 uses the SHA256 operation on lines 3 and 4 – MD5 and SHA1 aren't considered to be safe anymore.

Listing 5

Comparing with SHA256

#! /bin/bash
# Create hash values
hashValue1=$(sha256sum $1 | awk ,{ print $1 }')
hashValue2=$(sha256sum $2 | awk ,{ print $1 }')
# Compare hash values
if [ $(echo -e "$hashValue1\n$hashValue2" | uniq | wc -l) == 1 ]; then
  echo "$1 and $2 are identical."
  exit 0
fi
echo "$1 and $2 are not identical."
exit 1

The more compact Listing 6 solves the problem with less computational cost but requires a deeper understanding of shell programming. You execute it with two files as parameters. Following usual Unix practices, the return value 0 in line 3 is for parity, and the value 1 in line 5 for disparity.

Listing 6

Comparing with SHA256 (II)

#! /bin/bash
if [ "$(sha256sum $1 | awk ,{ print $1 }')" == "$(sha256sum $2 | awk ,{ print $1 }')" ]; then
 echo "$1 and $2 are identical."; exit 0
fi
 echo "$1 are $2 are not identical."
exit 1

Buy this article as PDF

Express-Checkout as PDF

Pages: 6

Price $0.99
(incl. VAT)

Buy Ubuntu User

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content