Skip to content

Instantly share code, notes, and snippets.

@Rafastoievsky
Created November 2, 2020 03:53
Show Gist options
  • Select an option

  • Save Rafastoievsky/e02700e197732eaeb9dfa6773b280e46 to your computer and use it in GitHub Desktop.

Select an option

Save Rafastoievsky/e02700e197732eaeb9dfa6773b280e46 to your computer and use it in GitHub Desktop.
Whatsapp group chat analysis: getting commond words
commond_words = chat[['Author','Message']].copy()
from nltk.corpus import stopwords
STOPWORDS = stopwords.words('spanish')
stopwords = list(STOPWORDS)
extra = ["<multimedia", "omitido>", "k", "d","si","multimedia", "omitido"]
stopwords = stopwords + extra
commond_words["Message"] = (commond_words["Message"]
.str.lower()
.str.split()
.apply(lambda x: [item for item in x if item not in stopwords])
.explode()
.reset_index(drop=True)
)
commond_words['Message']= commond_words['Message'].apply(remove_emoji)
commond_words['Message']= commond_words['Message'].replace('nan', np.NaN)
commond_words['Message']= commond_words['Message'].replace('', np.NaN)
commond_words['Message']= commond_words.Message.str.replace(r"(a|j)?(ja)+(a|j)?", "jaja")
commond_words['Message']= commond_words.Message.str.replace(r"(a|j)?(jaja)+(a|j)?", "jaja")
words_dict = dict(Counter(commond_words.Message))
words_dict = sorted(words_dict.items(), key=lambda x: x[1], reverse=True)
words_dict = pd.DataFrame(words_dict, columns=['words', 'count'])
fig = px.bar(words_dict.head(10).dropna(), x='words', y='count',
labels={'words':'Common Words'},
height=400)
fig.update_traces(marker_color='#EDCC8B', marker_line_color='#D4A29C',
marker_line_width=1.5, opacity=0.6)
fig.update_layout(title_text='Commond Words Chart')
fig.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment