Skip to content

Instantly share code, notes, and snippets.

@dat-boris
Created May 4, 2013 01:56
Show Gist options
  • Save dat-boris/5515726 to your computer and use it in GitHub Desktop.
Save dat-boris/5515726 to your computer and use it in GitHub Desktop.
Data frequency analysis script
#!/usr/bin/python
"""
Input:
[visitor, element, t]
Output:
[element, count(visitor)].sort(lambda a,b: b[1]-a[1])
"""
if __name__=='__main__':
fh = open('sequence', 'r')
lines = fh.readlines()
elemCount = {}
for l in lines:
#print l
(vis, elem, t) = l.split("\t")
if elem not in elemCount:
elemCount[elem] = 1
else:
elemCount[elem] += 1
for e,count in elemCount.iteritems():
print "%s\t%i" % (e, count)
sortedElem = elemCount.keys().sort(
lambda a,b:
elemCount[b] - elemCount[a]
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment