Skip to content

Instantly share code, notes, and snippets.

@cigrainger
Created April 17, 2014 11:56
Show Gist options
  • Select an option

  • Save cigrainger/10977462 to your computer and use it in GitHub Desktop.

Select an option

Save cigrainger/10977462 to your computer and use it in GitHub Desktop.
import re, string
pattern=re.compile(r'[^a-zA-Z ]')
abstracts = []
with open("C:\Users\graingec\spillovers\\abstracts\\abstracts.csv","rb") as f:
for line in f:
y = line.split(',',1)
if len(y)==2:
abstracts.append(y[1])
abstracts1 = []
for i in abstracts:
abstracts1.append(i.replace('<image>',''))
abstracts = []
for i in abstracts1:
abstracts.append(pattern.sub('',i.lower()))
abstracts = abstracts[1:len(abstracts)]
with open("C:\Users\graingec\spillovers\\abstracts\\abstracts.txt","w") as f:
for i in abstracts:
f.write(i)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment