Skip to content

Instantly share code, notes, and snippets.

@thinkjson
Created October 25, 2012 12:43
Show Gist options
  • Select an option

  • Save thinkjson/3952355 to your computer and use it in GitHub Desktop.

Select an option

Save thinkjson/3952355 to your computer and use it in GitHub Desktop.
Check for duplicate lines in a group of files
#!/usr/bin/python
# Invocation:
# cat [files] | python checksum_per_line.py | sort | uniq -c | sort -nr | awk '{ if ($1 > 1) print $0 }'
import sys
import hashlib
for line in sys.stdin:
print hashlib.md5(line).hexdigest()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment