Last active
October 12, 2019 01:21
-
-
Save simonlindgren/e041443c8e5ce98712d33b61f4ace44a to your computer and use it in GitHub Desktop.
Sentiment analysis with VADER
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Sentiment analysis with VADER\n", | |
"[VADER](https://github.com/cjhutto/vaderSentiment) (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.\n", | |
"\n", | |
"First, import and set up VADER, and pandas." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer\n", | |
"analyzer = SentimentIntensityAnalyzer()\n", | |
"\n", | |
"import pandas as pd" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We read a text file where each line is a sentence (or short social media post) into a list." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"sentences = open('tweets.txt', 'r').readlines()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Remove duplicates from the sentence list, **_if_** that's what we want to do." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"sentences = set(sentences)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Perform the sentiment analysis and write the scores as a [list](https://docs.python.org/3/tutorial/introduction.html#lists) of [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"vader_scores = [analyzer.polarity_scores(sentence) for sentence in sentences]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Iterate over the `sentences` list and the `vader_scores` list in parallel, to be able to add each sentence as a key to the dictionary of its scores." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"for sentence, score_dict in zip(sentences, vader_scores):\n", | |
" score_dict['text'] = sentence" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now `vader_scores` is a list of dictionaries with scores and sentences. We write it to a [pandas dataframe](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"vader_df = pd.DataFrame(vader_scores)[['text', 'compound', 'neg', 'neu', 'pos']]\n", | |
"vader_df = vader_df.sort_values('compound', ascending=False)\n", | |
"vader_df.head(10)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Save to csv." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"vader_df.to_csv(\"vader_sentiments.csv\", index = False)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.3" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment