Last active
November 23, 2021 17:20
-
-
Save andrea-dagostino/9949889eeaf86915182064dd1bb3870f to your computer and use it in GitHub Desktop.
posts/raggruppamento-testuale-con-tf-idf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def get_top_keywords(n_terms): | |
| """Questa funzione restituisce le keyword per ogni centroide del KMeans""" | |
| df = pd.DataFrame(X.todense()).groupby(clusters).mean() # raggruppa il vettore TF-IDF per gruppo | |
| terms = vectorizer.get_feature_names_out() # accedi ai termini del tf idf | |
| for i,r in df.iterrows(): | |
| print('\nCluster {}'.format(i)) | |
| print(','.join([terms[t] for t in np.argsort(r)[-n_terms:]])) # per ogni riga del dataframe, trova gli n termini che hanno il punteggio più alto | |
| get_top_keywords(10) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment