Last active
February 8, 2017 14:41
-
-
Save tmylk/14f887f8585e9f89ab5896a10308447c to your computer and use it in GitHub Desktop.
code for "how to get the similarity you need" from https://speakerdeck.com/tmylk/wordrank-pydata-5-min-talk
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# just run this in the end of 01_pride_and_predjudice.ipynb from https://github.com/cytora/pycon-nlp-in-10-lines | |
processed_sentences = [sent.lemma_.split() for sent in processed_text.sents] | |
interchangeable_words_model = Word2Vec( | |
sentences=processed_sentences, | |
workers=multiprocessing.cpu_count() - 1, # use your cores | |
window=2, sg=1) | |
attributes_of_model = Word2Vec( | |
sentences=processed_sentences, | |
workers=multiprocessing.cpu_count() - 1, # use your cores | |
window=50, sg=1) | |
attributes_of_model.most_similar(u'darcy') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment