Skip to content

Instantly share code, notes, and snippets.

@meeuw
Last active April 27, 2016 19:00
Show Gist options
  • Save meeuw/27c1431d113038f7074ec3de47955c60 to your computer and use it in GitHub Desktop.
Save meeuw/27c1431d113038f7074ec3de47955c60 to your computer and use it in GitHub Desktop.
find duplicates in a (sorted) tsv by multiple keys, example usage mysql -Be 'select * from table order by field3,field4'|dedup.py field3,field4
#!/usr/bin/python
import sys
header = None
keys = []
firstmatch = None
for line in sys.stdin.readlines():
s = line[:-1].split('\t')
if not header:
header = s
for a in sys.argv[1].split(','):
keys.append(header.index(a))
continue
if firstmatch:
equal = True
for key in keys:
if (firstmatch[key] != s[key]):
equal = False
break
if equal:
print s[0]
firstmatch = s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment