Skip to content

Instantly share code, notes, and snippets.

@bozhink
Created December 16, 2016 11:12
Show Gist options
  • Save bozhink/0c62b88e1b761c669af9ff2d152c82d3 to your computer and use it in GitHub Desktop.
Save bozhink/0c62b88e1b761c669af9ff2d152c82d3 to your computer and use it in GitHub Desktop.
Calculate word frequency
// source: https://blog.heroku.com/kafka-data-pipelines-frp-node?c=7013A000000tyBBQAY&utm_campaign=Newsletter_December_2016&utm_medium=email&utm_source=newsletter&utm_content=blog&utm_term=kafka-data-pipelines-frp-node
function wordFreq(accumulator, string) {
return _.replace(string, /[\.!\?"'#,\(\):;-]/g, '') //remove special characters
.split(/\s/)
.map(word => word.toLowerCase())
.filter(word => ( !_.includes(stopWords, word) )) //dump words in stop list
.filter(word => ( word.match(/.{2,}/) )) //dump single char words
.filter(word => ( !word.match(/\d+/) )) //dump all numeric words
.filter(word => ( !word.match(/http/) )) //dump words containing http
.filter(word => ( !word.match(/@/) )) //dump words containing @
.reduce((map, word) =>
Object.assign(map, {
[word]: (map[word]) ? map[word] + 1 : 1,
}), accumulator
)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment