We can make this file beautiful and searchable if this error is corrected: No tabs found in this TSV file in line 0.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
пушистый | |
и | |
пудель | |
котенок | |
громкий | |
мяукал | |
лаял | |
большой | |
бегал | |
мурлыкал |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# just run this in the end of 01_pride_and_predjudice.ipynb from https://github.com/cytora/pycon-nlp-in-10-lines | |
processed_sentences = [sent.lemma_.split() for sent in processed_text.sents] | |
interchangeable_words_model = Word2Vec( | |
sentences=processed_sentences, | |
workers=multiprocessing.cpu_count() - 1, # use your cores | |
window=2, sg=1) | |
attributes_of_model = Word2Vec( | |
sentences=processed_sentences, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# -*- coding: utf-8 -*- | |
# | |
# Copyright (C) 2011 Radim Rehurek <[email protected]> | |
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html | |
# | |
# Parts of the LDA inference code come from Dr. Hoffman's `onlineldavb.py` script, | |
# (C) 2010 Matthew D. Hoffman, GNU GPL 3.0 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from time import time | |
import logging | |
import numpy as np | |
import pandas as pd | |
from sklearn.datasets import fetch_20newsgroups | |
from sklearn.feature_extraction.text import CountVectorizer | |
from sklearn.decomposition import LatentDirichletAllocation | |
from gensim.matutils import Sparse2Corpus | |
from gensim.models.ldamodel import LdaModel |