Skip to content

Instantly share code, notes, and snippets.

@fclesio
Created July 3, 2019 10:45
Show Gist options
  • Save fclesio/9165c246e2df2686f873ffe8490840be to your computer and use it in GitHub Desktop.
Save fclesio/9165c246e2df2686f873ffe8490840be to your computer and use it in GitHub Desktop.
def get_word_ngrams_list(df, artist, word_ngram):
def get_top_word_n_bigram(corpus, n=None):
vec = CountVectorizer(ngram_range=(word_ngram, word_ngram)).fit(corpus)
bag_of_words = vec.transform(corpus)
sum_words = bag_of_words.sum(axis=0)
words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
return words_freq[:n]
common_words = get_top_word_n_bigram(df[df['artist'] == artist]['lyric'], 20)
df3 = pd.DataFrame(common_words, columns = ['ngram' , 'qty'])
return df3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment