Skip to content

Instantly share code, notes, and snippets.

@simonlindgren
Last active October 12, 2019 01:21
Show Gist options
  • Save simonlindgren/e041443c8e5ce98712d33b61f4ace44a to your computer and use it in GitHub Desktop.
Save simonlindgren/e041443c8e5ce98712d33b61f4ace44a to your computer and use it in GitHub Desktop.
Sentiment analysis with VADER
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sentiment analysis with VADER\n",
"[VADER](https://github.com/cjhutto/vaderSentiment) (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.\n",
"\n",
"First, import and set up VADER, and pandas."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer\n",
"analyzer = SentimentIntensityAnalyzer()\n",
"\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We read a text file where each line is a sentence (or short social media post) into a list."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sentences = open('tweets.txt', 'r').readlines()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Remove duplicates from the sentence list, **_if_** that's what we want to do."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sentences = set(sentences)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Perform the sentiment analysis and write the scores as a [list](https://docs.python.org/3/tutorial/introduction.html#lists) of [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"vader_scores = [analyzer.polarity_scores(sentence) for sentence in sentences]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Iterate over the `sentences` list and the `vader_scores` list in parallel, to be able to add each sentence as a key to the dictionary of its scores."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for sentence, score_dict in zip(sentences, vader_scores):\n",
" score_dict['text'] = sentence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now `vader_scores` is a list of dictionaries with scores and sentences. We write it to a [pandas dataframe](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"vader_df = pd.DataFrame(vader_scores)[['text', 'compound', 'neg', 'neu', 'pos']]\n",
"vader_df = vader_df.sort_values('compound', ascending=False)\n",
"vader_df.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Save to csv."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"vader_df.to_csv(\"vader_sentiments.csv\", index = False)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment