Created
October 16, 2013 01:49
-
-
Save bxjx/7001437 to your computer and use it in GitHub Desktop.
Example of using gramophone ngrams as input into Natural's TF-IDF function. It's adapted from the example in the Natural docs. NOTE that all arguments to the tfidf methods must be arrays. If they are strings, the tfidf object will run the default tokenizer over the argument. I might submit a pull request to natural so that `natural.TfIdf` could …
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
var natural = require('natural'), | |
TfIdf = natural.TfIdf, | |
tfidf = new TfIdf(); | |
var gramophone = require('gramophone'); | |
var docs = [ | |
'this document is about node programming language.', | |
'this document is about ruby programming language.', | |
'this document is about the ruby programming language and node programming language.', | |
'this document is about node programming language. it has node programming language examples' | |
]; | |
docs.forEach(function(doc, index){ | |
var ngrams = gramophone.extract(doc, { min: 1 , flatten: true}); | |
console.error('ngrams for doc ' + index + ':'); | |
console.error(ngrams); | |
tfidf.addDocument(ngrams); | |
}); | |
console.log('node programming language -----------'); | |
tfidf.tfidfs(['node programming language'], function(i, measure) { | |
console.log('document #' + i + ' is ' + measure); | |
}); | |
console.log('"document" --------------------------------'); | |
tfidf.tfidfs('"document"', function(i, measure) { | |
console.log('document #' + i + ' is ' + measure); | |
}); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
running this produces the following output:
modules:
Am I missing something ?
edit: I have the same problem using only natural's tf-idf example, therefore my problem has nothing to do with gramophone