Skip to content

Instantly share code, notes, and snippets.

@NaelsonDouglas
Created February 4, 2019 03:36
Show Gist options
  • Save NaelsonDouglas/ee30b543656f5498627288f1fed7551c to your computer and use it in GitHub Desktop.
Save NaelsonDouglas/ee30b543656f5498627288f1fed7551c to your computer and use it in GitHub Desktop.
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
data = open('./texto_aleatorio.txt')
custom_stop_words = frozenset(["palavra1", "palavra2","palavra3"])
vectorizer = CountVectorizer(stop_words=custom_stop_words)
vectorizer.fit(data)
print(vectorizer.vocabulary_)
data = open('./texto_aleatorio.txt')
custom_stop_words = frozenset(["palavra1", "palavra2","palavra3"])
vectorizer = TfidfVectorizer(stop_words=custom_stop_words)
vectorizer.fit(data)
print(vectorizer.vocabulary_)
print(vectorizer.idf_)
data = open('./texto_aleatorio.txt')
vector = vectorizer.transform([data.read()])
print(vector.shape)
print('\n')
print(vector.toarray())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment