Skip to content

Instantly share code, notes, and snippets.

@cigrainger
Created February 27, 2014 22:35
Show Gist options
  • Select an option

  • Save cigrainger/9261102 to your computer and use it in GitHub Desktop.

Select an option

Save cigrainger/9261102 to your computer and use it in GitHub Desktop.
import textmining
import re
f = open('patentstitles.csv','r')
def cleantext(x):
y = []
for line in x:
y.append(line.split(',',1)[0])
z = []
for i in y:
z.append(re.sub('/^[A-Za-z ]+$/','',i).lower())
def termdocumentmatrix(x):
tdm = textmining.TermDocumentMatrix()
for i in x:
tdm.add_doc(x)
tdm.write_csv('patenttitlematrix.csv',cutoff=2)
c = cleantext(f)
termdocumentmatrix(c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment