Skip to content

Instantly share code, notes, and snippets.

@nazt
Created December 13, 2009 11:11
Show Gist options
  • Save nazt/255383 to your computer and use it in GitHub Desktop.
Save nazt/255383 to your computer and use it in GitHub Desktop.
file= new File('TIME.ALL')
stopwordsFile=new File('stopwords.txt')
/*stopwords=[]
stopwordsFile.eachLine { stopwords << it.toString().toUpperCase() }
def isStopWords={ input-> if (stopwords.contains(input)) return true else return false }*/
def TEXT =/^(\*TEXT)[s|t| ]([0-9]+)[s|t| ]([0-9]+\/[0-9]+\/[0-9]+) PAGE ([0-9]+)/
def contents=[:]
id=1
println file.size()
System.exit(0)
file.eachLine{
m=(it.toString()=~TEXT)
if (m)
{
id=m[0][2].toInteger()
contents[id]=[DATE:m[0][4].toString(),PAGE:m[0][7].toString(),DATA:""];
}
else
{
if(it.toString()!="*STOP")
{
contents[id].DATA<<=it.toString()
}
it.toString().tokenize().each { word -> if(!isStopWords(word)) contents[id].DATA<<=word.toString()+" " }
}
}
println contents[563].DATA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment