Created
August 19, 2016 19:04
-
-
Save filipenf/e9901883d66b8da65c151cf674e5f2a9 to your computer and use it in GitHub Desktop.
Reads fdupes(-r1) output and create relative symbolic links for each duplicate
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python | |
| # Reads fdupes(-r -1) output and create relative symbolic links for each duplicate | |
| # usage: fdupes -r1 . | ./lndupes.py | |
| import os | |
| from os.path import dirname, relpath, basename, join | |
| import sys | |
| lines = sys.stdin.readlines() | |
| for line in lines: | |
| files = line.strip().split(' ') | |
| first = files[0] | |
| print "First: %s "% first | |
| for dup in files[1:]: | |
| rel = os.path.relpath(dirname(first), dirname(dup)) | |
| print "Linking duplicate: %s to %s" % (dup, join(rel,basename(first))) | |
| os.unlink(dup) | |
| os.symlink(join(rel,basename(first)), dup) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
fdupesisn’t the right tool for snapshot deduplication. It only groups identical files by content and outputs them in arbitrary order, with no awareness of snapshot structure or relationships. When identical files exist in multiple paths within the directories being snapshotted,fdupesonly considers file content, not relative paths. Resolving snapshot ordering so that only older snapshots create symlinks to newer ones does not eliminate the fundamental problem. Older snapshots can still contain symlinks pointing to incorrect locations in newer ones.For example, if you have a template
README.mdused across projects and someone copies it into a project without modifying it, you may end up with a situation where the template in an older snapshot points to an instance under a project:The older snapshot might then contain a symlink like:
Even with careful scripting, this destroys the logical integrity of your snapshots. You lose the guarantee that each snapshot is a faithful, self-contained view of the filesystem. Use tools designed for this purpose — e.g.
rsync --link-dest,cp -al, or content-aware backup systems likeborgorrestic.