Skip to content

Instantly share code, notes, and snippets.

@raullenchai
Last active December 15, 2015 07:19
Show Gist options
  • Save raullenchai/5222963 to your computer and use it in GitHub Desktop.
Save raullenchai/5222963 to your computer and use it in GitHub Desktop.
Dump raw reuters in nltk into a csv -- compatible with Mallet's Topic Modeling
from nltk.corpus import reuters
mylist = reuters.fileids()
f = open('myfile','w')
for text in mylist:
something = text + '\t'
for cat in reuters.categories(text):
something += cat + " "
something += '\t' + reuters.raw(text).replace('\n', '') + '\n'
#print something
f.write(something)
f.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment