simonlindgren · October 12, 2019 01:21
diff --git a/vader.ipynb b/vader.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sentiment analysis with VADER\n",
    "[VADER](https://github.com/cjhutto/vaderSentiment) (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.\n",
    "\n",
    "First, import and set up VADER, and pandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer\n",
    "analyzer = SentimentIntensityAnalyzer()\n",
    "\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We read a text file where each line is a sentence (or short social media post) into a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sentences = open('tweets.txt', 'r').readlines()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remove duplicates from the sentence list, **_if_** that's what we want to do."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sentences = set(sentences)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Perform the sentiment analysis and write the scores as a [list](https://docs.python.org/3/tutorial/introduction.html#lists) of [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vader_scores = [analyzer.polarity_scores(sentence) for sentence in sentences]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Iterate over the `sentences` list and the `vader_scores` list in parallel, to be able to add each sentence as a key to the dictionary of its scores."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for sentence, score_dict in zip(sentences, vader_scores):\n",
    "    score_dict['text'] = sentence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now `vader_scores` is a list of dictionaries with scores and sentences. We write it to a [pandas dataframe](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vader_df = pd.DataFrame(vader_scores)[['text', 'compound', 'neg', 'neu', 'pos']]\n",
    "vader_df = vader_df.sort_values('compound', ascending=False)\n",
    "vader_df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Save to csv."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vader_df.to_csv(\"vader_sentiments.csv\", index = False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Sentiment analysis with VADER\n",
	"[VADER](https://github.com/cjhutto/vaderSentiment) (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.\n",
	"\n",
	"First, import and set up VADER, and pandas."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer\n",
	"analyzer = SentimentIntensityAnalyzer()\n",
	"\n",
	"import pandas as pd"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We read a text file where each line is a sentence (or short social media post) into a list."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"sentences = open('tweets.txt', 'r').readlines()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Remove duplicates from the sentence list, _if_ that's what we want to do."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"sentences = set(sentences)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Perform the sentiment analysis and write the scores as a [list](https://docs.python.org/3/tutorial/introduction.html#lists) of [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"vader_scores = [analyzer.polarity_scores(sentence) for sentence in sentences]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Iterate over the `sentences` list and the `vader_scores` list in parallel, to be able to add each sentence as a key to the dictionary of its scores."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"for sentence, score_dict in zip(sentences, vader_scores):\n",
	" score_dict['text'] = sentence"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now `vader_scores` is a list of dictionaries with scores and sentences. We write it to a [pandas dataframe](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame)."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"vader_df = pd.DataFrame(vader_scores)[['text', 'compound', 'neg', 'neu', 'pos']]\n",
	"vader_df = vader_df.sort_values('compound', ascending=False)\n",
	"vader_df.head(10)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Save to csv."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"vader_df.to_csv(\"vader_sentiments.csv\", index = False)"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.3"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}