This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def create_heatmap(similarity, cmap = "YlGnBu"): | |
| df = pd.DataFrame(similarity) | |
| df.columns = ['john', 'luke','mark', 'matt'] #ohn 0 mark 2 matt 3 luke 1 | |
| df.index = ['john', 'luke','mark', 'matt'] | |
| fig, ax = plt.subplots(figsize=(5,5)) | |
| sns.heatmap(df, cmap=cmap) | |
| from sklearn.metrics.pairwise import cosine_similarity | |
| from sklearn.feature_extraction.text import CountVectorizer | |
| import seaborn as sns |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def get_word_frequencies(text): | |
| r""" This function return a Counter with the most common words | |
| in a given text | |
| Parameters | |
| ---------- | |
| text: df['text'].tolist() | |
| Return | |
| ------ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # use the documents' list as a column in a dataframe | |
| df = pd.DataFrame(data, columns=["text"]) | |
| def get_word2vec(text): | |
| r""" | |
| Parameters | |
| ----------- | |
| text: str, text from dataframe, df['text'].tolist()""" | |
| num_workers = multiprocessing.cpu_count() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def format_topics_sentences(ldamodel, corpus): | |
| r"""This function associate to each review the dominant topic | |
| Parameters | |
| ---------- | |
| lda_model: gensim lda_model | |
| The current lda model calculated | |
| corpus: gensim corpus | |
| this is the corpus from the reviews |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def format_topics_sentences(ldamodel, corpus): | |
| r"""This function associate to each review the dominant topic | |
| Parameters | |
| ---------- | |
| lda_model: gensim lda_model | |
| The current lda model calculated | |
| corpus: gensim corpus | |
| this is the corpus from the reviews |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Design | Method | HIV |
|---|