I wrote and used these snippets to compare two root-filesystem backups from a KVM but they could be used for any similar directory hierarchy. The snippets helped me spot deltas between the two hierarchies and confirm the newer backup superseded the older one, so the old copy could be discarded.
This Gist presents a concise yet practical set of shell snippet recipes that help to visually and functionally compare two similar directory hierarchies (trees). For example, comparing two root filesystem backups, but it works for any pair of paths. The gist outlines three main approaches:
- comm → review output via
less -S
. - vimdiff → interactively explore differences.
- rhash → compare checksums for exact binary verification, then view differences with
comm
orvimdiff
.
- Great for backup verification, spotting file changes, and ensuring integrity.
- Provides well-documented, adaptable, and safe pipelines.
- Useful for sysadmins, DevOps engineers, and anyone working with filesystem snapshots.
-
Using
comm
- Generate sorted lists of files (including their paths and sizes) from two directories, optionally pruning specific paths (like
./var
). - Use
comm -3
to show items unique to each path list.
- Generate sorted lists of files (including their paths and sizes) from two directories, optionally pruning specific paths (like
-
Using
vimdiff
- Similar to
comm
, but instead pipes the results intovimdiff
for a side-by-side interactive visual diff.
- Similar to
-
Using
rhash
with recursive checksum comparison- Generate SFV (CRC32) checksum lists of each directory’s non-empty files (
--sfv
). - Optionally prune subdirectories and skip performance stats.
- Use
comm -3
orvimdiff
on sorted checksum lists to highlight true content differences.
- Generate SFV (CRC32) checksum lists of each directory’s non-empty files (
- Subshell usage preserves the current working directory and anchors the comparison by removing parent paths that would otherwise cause mismatches.
- Progress feedback (
time
,--speed
,--percents
) - Safe handling of filenames with whitespace (
-print0
,xargs -0
) - Ignoration of SFV comment lines (
grep -v '^;'
) - Ensuring proper sorting for reliable diffs
- Efficient browsing with
less -S
-
Quick Directory Diffs with Size Context
Spot missing or added files usingcomm
. -
Visual Confirmation
Explore side-by-side diffs interactively withvimdiff
. -
Content Integrity Check
Verify identical files using checksum-based methods withrhash
. -
Flexible Pruning
Exclude irrelevant paths (like logs or caches). -
Safe File Handling
Use proper flags and piping to handle edge cases (spaces, special chars). -
Deeper Tool Understanding
See how Unix tools likecomm
,vimdiff
,rhash
,grep
, andless
combine for robust comparisons.
- Fast and basic: Use
comm
for quick detection of additions/deletions. - Visual and interactive: Use
vimdiff
for detailed inspection. - Thorough and content-based: Use
rhash
to ensure files are truly identical.
- Change directory paths to suit your case.
- Adjust prunes (e.g., skip
./var
). - For large sets, remove performance flags in
rhash
for speed.
- Runs two subshells producing lists of file paths and file size sorted by file paths:
- Subshells are anchored to the specified dir
- Left:
cd
to/mnt/tmp-vey-disk-2
, skip./var
,find
files and print path+size,sort
by path. - Right:
cd
to/mnt/tmp-vey-disk-1
,find
files, print path+size,sort
by path.
comm -3
compares the two sorted streams and prints lines unique to each (suppresses common lines).- Pipe to
less -S
to view results with horizontal scrolling.
comm -3 \
<( cd /mnt/tmp-vey-disk-2 ; find . -path './var' -a -prune -o \( -type f -a -printf '%p\t%s\n' \) | sort -t$'\t' -k1,1 ) \
<( cd /mnt/tmp-vey-disk-1 ; find . -type f -a -printf '%p\t%s\n' | sort -t$'\t' -k1,1 ) |less -S
As the previous invocation but swaps comm
for vimdiff
and removes the less -S
vimdiff \
<( cd /mnt/tmp-vey-disk-2 ; find . -path './var' -a -prune -o \( -type f -a -printf '%p\t%s\n' \) | sort -t$'\t' -k1,1 ) \
<( cd /mnt/tmp-vey-disk-1 ; find . -type f -a -printf '%p\t%s\n' | sort -t$'\t' -k1,1 )
Here is a set of pipelines that should give you a useful result:
( cd ./tmp-vey-disk-1 ; time find . -size +0c -a -type f -a -print0 \
| xargs -0 -- rhash --speed --percents --sfv -- \
) > /tmp/left-tmp-find-pipeline.sfv
( cd ./tmp-vey-disk-2 ; time find . -path './var' -a -prune -o \( -size +0c -a -type f -a -print0 \) \
| xargs -0 -- rhash --speed --percents --sfv -- \
) > /tmp/right-tmp-find-pipeline.sfv
As above but prunes a path.
💡🏁 rhash
will be more performant with a lot of small files if you remove the --speed --percents
options, which will mute the statistics calculations and output.
comm -3 \
<( grep -v '^;' /tmp/left-tmp-find-pipeline.sfv | grep . |sort ) \
<( grep -v '^;' /tmp/right-tmp-find-pipeline.sfv | grep . | sort ) | less -S
vimdiff \
<( grep -v '^;' /tmp/left-tmp-find-pipeline.sfv | grep . |sort ) \
<( grep -v '^;' /tmp/right-tmp-find-pipeline.sfv | grep . | sort )
-
Runs a subshell so the current shell's cwd is unchanged and rooted at the specified dir.
-
Inside that subshell:
- time measures and prints the wall/CPU time for the pipeline.
find . -size +0c -a -type f -a -print0
- starts at ., finds regular files (-type f) with size > 0 bytes (-size +0c).
-print0
emits NUL-terminated filenames (safe for whitespace/newlines).
| xargs -0 -- rhash --speed --percents --sfv --
xargs -0
reads the NUL-separated names and supplies them as arguments to rhash.- the
--
after xargs stops option parsing; the--
passed to rhash indicates no more options (file args follow). - rhash options:
--speed
and--percents
show progress/performance.--sfv
requests SFV (CRC32) output.- Given file arguments,
rhash
computes checksums and writes the SFV to stdout.
-
The final shell redirection (
> /tmp/left-tmp-find-pipeline.sfv
) captures rhash's stdout (the SFV) into/tmp/left-tmp-find-pipeline.sfv
.
Summary: In a subshell rooted at ./tmp-vey-disk-1
, this finds all non-empty regular files, computes CRC32 SFV entries for them with progress output, saves the SFV to /tmp/left-tmp-find-pipeline.sfv
, and reports timing.
As above but prunes a path during the find invocation.
-
comm -3 A B
- Compares two sorted text streams A and B.
- -3 suppresses column 3 (lines common to both), leaving only lines unique to A (output column 1) and unique to B (output column 2).
-
Process substitution for left stream:
<( grep -v '^;' /tmp/left-tmp-find-pipeline.sfv | grep . | sort )
grep -v '^;' /tmp/left-tmp-find-pipeline.sfv
- Remove lines starting with ';' (rhash SFV comment/header lines).
| grep .
- Remove empty lines (keep only non-blank lines).
| sort
- Ensure the stream is sorted; required by comm to work correctly.
-
Process substitution for right stream:
<( grep -v '^;' /tmp/right-tmp-find-pipeline.sfv | grep . | sort )
- Same steps as left but operating on the right SFV file.
-
Pipe to less -S
less -S
shows output with horizontal truncation (no line wrapping) and lets you scroll.- Useful because SFV lines contain "pathsize" and can be long.
-
Overall effect
- Produce a paged view of entries present in only one SFV or the other (paths and checksums), excluding SFV comment/header lines and blank lines. This helps spot files that differ at the binary level between the left and right directory hierarchies.
As above but invokes a visual diff with vimdiff
.