Skip to content

Instantly share code, notes, and snippets.

@l1m2p3
Last active January 19, 2018 06:17
Show Gist options
  • Save l1m2p3/c4cde96ae9eb89c18eb651d5d051da74 to your computer and use it in GitHub Desktop.
Save l1m2p3/c4cde96ae9eb89c18eb651d5d051da74 to your computer and use it in GitHub Desktop.
This script helps uploads the most frequent words to Dynamo. It requires the "put_words" function from https://gist.github.com/ShawnLMP/fe2e355d5af19e17e5a21bcf356b3d45, as well as this data set: https://www.kaggle.com/rtatman/english-word-frequency/data
import csv
import sys
from dynamo_access import put_words, get_words
def get_frequent_words(dataset , numOfWords):
pairs = []
with open(dataset, 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
next(reader, None)
for row in reader:
word = row[0]
freq = int(row[1])
pairs.append((word, freq))
pairs.sort(key=lambda x: x[1])
return [p[0] for p in pairs[0 if numOfWords > len(pairs) else len(pairs) - numOfWords:]]
if __name__ == '__main__':
dataset = 'unigram_freq.csv'
numOfWords = 10
frequent_words = get_frequent_words(dataset, numOfWords)
put_words(frequent_words)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment