Skip to content

Instantly share code, notes, and snippets.

@satomacoto
Created November 27, 2012 03:44
Show Gist options
  • Save satomacoto/4152228 to your computer and use it in GitHub Desktop.
Save satomacoto/4152228 to your computer and use it in GitHub Desktop.
create a document-term frequency matrix
# create candidate sentense set
docs = [['a', 'b', 'c'],
['b', 'd']]
terms = ['a', 'b', 'c', 'd']
vlist = []
n = len(docs)
d = len(terms)
for doc in docs:
tmp = []
for term in doc:
tmp.append(terms.index(term))
v = np.bincount(tmp, minlength=d)
vlist.append(v)
V = np.vstack(vlist)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment