Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save kyle0r/32d77ea5f3b451814c06a5947a248c03 to your computer and use it in GitHub Desktop.
Save kyle0r/32d77ea5f3b451814c06a5947a248c03 to your computer and use it in GitHub Desktop.
Side-by-side visual comparison of similar directory hierarchies with comm, vimdiff and rhash

I wrote and used these snippets to compare two root-filesystem backups from a KVM but they could be used for any similar directory hierarchy. The snippets helped me spot deltas between the two hierarchies and confirm the newer backup superseded the older one, so the old copy could be discarded.

Summary

This Gist presents a concise yet practical set of shell snippet recipes that help to visually and functionally compare two similar directory hierarchies (trees). For example, comparing two root filesystem backups, but it works for any pair of paths. The gist outlines three main approaches:

  1. comm → review output via less -S.
  2. vimdiff → interactively explore differences.
  3. rhash → compare checksums for exact binary verification, then view differences with comm or vimdiff.
  • Great for backup verification, spotting file changes, and ensuring integrity.
  • Provides well-documented, adaptable, and safe pipelines.
  • Useful for sysadmins, DevOps engineers, and anyone working with filesystem snapshots.

Details

  1. Using comm

    • Generate sorted lists of files (including their paths and sizes) from two directories, optionally pruning specific paths (like ./var).
    • Use comm -3 to show items unique to each path list.
  2. Using vimdiff

    • Similar to comm, but instead pipes the results into vimdiff for a side-by-side interactive visual diff.
  3. Using rhash with recursive checksum comparison

    • Generate SFV (CRC32) checksum lists of each directory’s non-empty files (--sfv).
    • Optionally prune subdirectories and skip performance stats.
    • Use comm -3 or vimdiff on sorted checksum lists to highlight true content differences.

Pipeline best practices covered

  • Subshell usage preserves the current working directory and anchors the comparison by removing parent paths that would otherwise cause mismatches.
  • Progress feedback (time, --speed, --percents)
  • Safe handling of filenames with whitespace (-print0, xargs -0)
  • Ignoration of SFV comment lines (grep -v '^;')
  • Ensuring proper sorting for reliable diffs
  • Efficient browsing with less -S

What a reader can learn

  • Quick Directory Diffs with Size Context
    Spot missing or added files using comm.

  • Visual Confirmation
    Explore side-by-side diffs interactively with vimdiff.

  • Content Integrity Check
    Verify identical files using checksum-based methods with rhash.

  • Flexible Pruning
    Exclude irrelevant paths (like logs or caches).

  • Safe File Handling
    Use proper flags and piping to handle edge cases (spaces, special chars).

  • Deeper Tool Understanding
    See how Unix tools like comm, vimdiff, rhash, grep, and less combine for robust comparisons.

Reader's guide

1. Choose Your Comparison Strategy

  • Fast and basic: Use comm for quick detection of additions/deletions.
  • Visual and interactive: Use vimdiff for detailed inspection.
  • Thorough and content-based: Use rhash to ensure files are truly identical.

2. Adapt the Snippets

  • Change directory paths to suit your case.
  • Adjust prunes (e.g., skip ./var).
  • For large sets, remove performance flags in rhash for speed.

The snippets

comm pipeline

  • Runs two subshells producing lists of file paths and file size sorted by file paths:
    • Subshells are anchored to the specified dir
    • Left: cd to /mnt/tmp-vey-disk-2, skip ./var, find files and print path+size, sort by path.
    • Right: cd to /mnt/tmp-vey-disk-1, find files, print path+size, sort by path.
  • comm -3 compares the two sorted streams and prints lines unique to each (suppresses common lines).
  • Pipe to less -S to view results with horizontal scrolling.
comm -3 \
  <( cd /mnt/tmp-vey-disk-2 ; find . -path './var' -a -prune -o \( -type f -a -printf '%p\t%s\n' \) | sort -t$'\t' -k1,1 ) \
  <( cd /mnt/tmp-vey-disk-1 ; find . -type f -a -printf '%p\t%s\n' | sort -t$'\t' -k1,1 ) |less -S

vimdiff pipeline

As the previous invocation but swaps comm for vimdiff and removes the less -S

vimdiff \
  <( cd /mnt/tmp-vey-disk-2 ; find . -path './var' -a -prune -o \( -type f -a -printf '%p\t%s\n' \) | sort -t$'\t' -k1,1 ) \
  <( cd /mnt/tmp-vey-disk-1 ; find . -type f -a -printf '%p\t%s\n' | sort -t$'\t' -k1,1 )

What about a recursive hash comparison?

Here is a set of pipelines that should give you a useful result:

rhash the left-hand side

( cd ./tmp-vey-disk-1 ; time find . -size +0c -a -type f -a -print0 \
  | xargs -0 -- rhash --speed --percents --sfv -- \
) > /tmp/left-tmp-find-pipeline.sfv

rhash the right-hand side

( cd ./tmp-vey-disk-2 ; time find . -path './var' -a -prune -o \( -size +0c -a -type f -a -print0 \) \
  | xargs -0 -- rhash --speed --percents --sfv -- \
) > /tmp/right-tmp-find-pipeline.sfv

As above but prunes a path.

💡🏁 rhash will be more performant with a lot of small files if you remove the --speed --percents options, which will mute the statistics calculations and output.

Compare the results with comm

comm -3 \
  <( grep -v '^;' /tmp/left-tmp-find-pipeline.sfv | grep . |sort ) \
  <( grep -v '^;' /tmp/right-tmp-find-pipeline.sfv | grep . | sort ) | less -S

Compare the results with vimdiff

vimdiff \
  <( grep -v '^;' /tmp/left-tmp-find-pipeline.sfv | grep . |sort ) \
  <( grep -v '^;' /tmp/right-tmp-find-pipeline.sfv | grep . | sort )

Explainer for creating the left-hand checksums

  • Runs a subshell so the current shell's cwd is unchanged and rooted at the specified dir.

  • Inside that subshell:

    • time measures and prints the wall/CPU time for the pipeline.
    • find . -size +0c -a -type f -a -print0
      • starts at ., finds regular files (-type f) with size > 0 bytes (-size +0c).
      • -print0 emits NUL-terminated filenames (safe for whitespace/newlines).
    • | xargs -0 -- rhash --speed --percents --sfv --
      • xargs -0 reads the NUL-separated names and supplies them as arguments to rhash.
      • the -- after xargs stops option parsing; the -- passed to rhash indicates no more options (file args follow).
      • rhash options:
        • --speed and --percents show progress/performance.
        • --sfv requests SFV (CRC32) output.
        • Given file arguments, rhash computes checksums and writes the SFV to stdout.
  • The final shell redirection (> /tmp/left-tmp-find-pipeline.sfv) captures rhash's stdout (the SFV) into /tmp/left-tmp-find-pipeline.sfv.

Summary: In a subshell rooted at ./tmp-vey-disk-1, this finds all non-empty regular files, computes CRC32 SFV entries for them with progress output, saves the SFV to /tmp/left-tmp-find-pipeline.sfv, and reports timing.

Explainer for creating the right-hand checksums

As above but prunes a path during the find invocation.

Explainer for rhash results comparison with comm

  • comm -3 A B

    • Compares two sorted text streams A and B.
    • -3 suppresses column 3 (lines common to both), leaving only lines unique to A (output column 1) and unique to B (output column 2).
  • Process substitution for left stream: <( grep -v '^;' /tmp/left-tmp-find-pipeline.sfv | grep . | sort )

    • grep -v '^;' /tmp/left-tmp-find-pipeline.sfv
      • Remove lines starting with ';' (rhash SFV comment/header lines).
    • | grep .
      • Remove empty lines (keep only non-blank lines).
    • | sort
      • Ensure the stream is sorted; required by comm to work correctly.
  • Process substitution for right stream: <( grep -v '^;' /tmp/right-tmp-find-pipeline.sfv | grep . | sort )

    • Same steps as left but operating on the right SFV file.
  • Pipe to less -S

    • less -S shows output with horizontal truncation (no line wrapping) and lets you scroll.
    • Useful because SFV lines contain "pathsize" and can be long.
  • Overall effect

    • Produce a paged view of entries present in only one SFV or the other (paths and checksums), excluding SFV comment/header lines and blank lines. This helps spot files that differ at the binary level between the left and right directory hierarchies.

Explainer for rhash results comparison with vimdiff

As above but invokes a visual diff with vimdiff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment