Skip to content

Instantly share code, notes, and snippets.

@imankulov
Last active February 25, 2018 12:25
Show Gist options
  • Save imankulov/85b0994544fd354cba58512b849b2a7b to your computer and use it in GitHub Desktop.
Save imankulov/85b0994544fd354cba58512b849b2a7b to your computer and use it in GitHub Desktop.
PyCoffee 2018-02-28 (Trump tweets / Bitcoin price correlation)
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# pycoffee-2018-02-28 ☕️🐍\n",
"\n",
"This time in Porto i/o for a change we decided to get our hands dirty and play with some data visualisation and stuff.\n",
"\n",
"We started with a short brainstorm. Its result was an idea to find out if there's any correlation between Donald Trump's tweet sentiments and bitcoin price. At firt sign it may not make any sense, but if you think deeper, all of a sudden there is something inexplicably common in them: they are both unpredictable, they love to climb up and fall down, and they both cause the biggest hype of 2017. \n",
"\n",
"Here we go, então!\n",
"\n",
"Let's start with Trump tweets. Believe or not, but there's a special website (and a github repository) which automatically downloads each of them and shares them in easy to parse json.zip format. The website is http://www.trumptwitterarchive.com/, and the corresponding repository is this https://github.com/bpb27/trump_tweet_data_archive\n",
"\n",
"We downloaded and unzipped the data and then created the data frame."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>created_at</th>\n",
" <th>favorite_count</th>\n",
" <th>id_str</th>\n",
" <th>in_reply_to_user_id_str</th>\n",
" <th>is_retweet</th>\n",
" <th>retweet_count</th>\n",
" <th>source</th>\n",
" <th>text</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2016-12-31 18:59:04</td>\n",
" <td>0</td>\n",
" <td>815271067749060608</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>9529</td>\n",
" <td>Twitter for iPhone</td>\n",
" <td>RT @realDonaldTrump: Happy Birthday @DonaldJTr...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2016-12-31 18:58:12</td>\n",
" <td>55601</td>\n",
" <td>815270850916208640</td>\n",
" <td>NaN</td>\n",
" <td>False</td>\n",
" <td>9529</td>\n",
" <td>Twitter for iPhone</td>\n",
" <td>Happy Birthday @DonaldJTrumpJr!\\nhttps://t.co/...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2016-12-31 13:17:21</td>\n",
" <td>350860</td>\n",
" <td>815185071317676032</td>\n",
" <td>NaN</td>\n",
" <td>False</td>\n",
" <td>141853</td>\n",
" <td>Twitter for Android</td>\n",
" <td>Happy New Year to all, including to my many en...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2016-12-30 22:18:18</td>\n",
" <td>84254</td>\n",
" <td>814958820980039680</td>\n",
" <td>NaN</td>\n",
" <td>False</td>\n",
" <td>23213</td>\n",
" <td>Twitter for Android</td>\n",
" <td>Russians are playing @CNN and @NBCNews for suc...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2016-12-30 19:46:55</td>\n",
" <td>25336</td>\n",
" <td>814920722208296960</td>\n",
" <td>NaN</td>\n",
" <td>False</td>\n",
" <td>7366</td>\n",
" <td>Twitter for iPhone</td>\n",
" <td>Join @AmerIcan32, founded by Hall of Fame lege...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" created_at favorite_count id_str \\\n",
"0 2016-12-31 18:59:04 0 815271067749060608 \n",
"1 2016-12-31 18:58:12 55601 815270850916208640 \n",
"2 2016-12-31 13:17:21 350860 815185071317676032 \n",
"3 2016-12-30 22:18:18 84254 814958820980039680 \n",
"4 2016-12-30 19:46:55 25336 814920722208296960 \n",
"\n",
" in_reply_to_user_id_str is_retweet retweet_count source \\\n",
"0 NaN True 9529 Twitter for iPhone \n",
"1 NaN False 9529 Twitter for iPhone \n",
"2 NaN False 141853 Twitter for Android \n",
"3 NaN False 23213 Twitter for Android \n",
"4 NaN False 7366 Twitter for iPhone \n",
"\n",
" text \n",
"0 RT @realDonaldTrump: Happy Birthday @DonaldJTr... \n",
"1 Happy Birthday @DonaldJTrumpJr!\\nhttps://t.co/... \n",
"2 Happy New Year to all, including to my many en... \n",
"3 Russians are playing @CNN and @NBCNews for suc... \n",
"4 Join @AmerIcan32, founded by Hall of Fame lege... "
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"df2016 = pd.read_json('condensed_2016.json')\n",
"df2017 = pd.read_json('condensed_2017.json')\n",
"df2018 = pd.read_json('condensed_2018.json')\n",
"\n",
"trump = pd.concat([df2016, df2017, df2018])\n",
"trump.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next step is to estimate the Donald's mood for every tweet. None of us had previous experience with sentiment analysis, so we decided to follow the StackOverflow-driven development model.\n",
"\n",
"Quite fast we found a model we want. Naturally, it was non other than venerable [NLTK][]. It has a pre-trained sentiment model [VADER][] which we decided to use without a moment's hesitation.\n",
"\n",
"It was much easier than we though. The quality of such a blind application of a foreign model is questionable of course, but don't want to spend way too much time on such minor details as model validation.\n",
"\n",
"The sentiment analizer returns takes a phrase and returns an object, consisting of several fields. The \"compound\" attribute of the object is basically a normalized version of the difference between positive and negative elements, found in the text. \n",
"\n",
"Looks good, but on top of that we can think of *intensity* of the phrase, which is the sum of `positive` and `negative`\n",
"\n",
"[NLTK]: http://www.nltk.org/\n",
"[VADER]: http://www.nltk.org/howto/sentiment.html"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from nltk.sentiment.vader import SentimentIntensityAnalyzer\n",
"sid = SentimentIntensityAnalyzer()\n",
"\n",
"def sentiment(row):\n",
" score = sid.polarity_scores(row.text)\n",
" row['sentiment'] = score['compound']\n",
" row['intensity'] = score['pos'] + score['neg']\n",
" return row\n",
" \n",
"trump = trump.apply(sentiment, axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's how the results look like"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>text</th>\n",
" <th>sentiment</th>\n",
" <th>intensity</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>RT @realDonaldTrump: Happy Birthday @DonaldJTrumpJr!\\nhttps://t.co/uRxyCD3hBz</td>\n",
" <td>0.6114</td>\n",
" <td>0.444</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Happy Birthday @DonaldJTrumpJr!\\nhttps://t.co/uRxyCD3hBz</td>\n",
" <td>0.6114</td>\n",
" <td>0.571</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Happy New Year to all, including to my many enemies and those who have fought me and lost so bad...</td>\n",
" <td>-0.4911</td>\n",
" <td>0.476</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Russians are playing @CNN and @NBCNews for such fools - funny to watch, they don't have a clue! ...</td>\n",
" <td>0.2695</td>\n",
" <td>0.333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Join @AmerIcan32, founded by Hall of Fame legend @JimBrownNFL32 on 1/19/2017 in Washington, D.C....</td>\n",
" <td>0.6249</td>\n",
" <td>0.282</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Great move on delay (by V. Putin) - I always knew he was very smart!</td>\n",
" <td>0.7257</td>\n",
" <td>0.492</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>My Administration will follow two simple rules: https://t.co/ZWk0j4H8Qy</td>\n",
" <td>0.0000</td>\n",
" <td>0.000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>'Economists say Trump delivered hope' https://t.co/SjGBgglIuQ</td>\n",
" <td>0.4404</td>\n",
" <td>0.367</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>not anymore. The beginning of the end was the horrible Iran deal, and now this (U.N.)! Stay stro...</td>\n",
" <td>-0.1984</td>\n",
" <td>0.251</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>We cannot continue to let Israel be treated with such total disdain and disrespect. They used to...</td>\n",
" <td>0.3400</td>\n",
" <td>0.398</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" text \\\n",
"0 RT @realDonaldTrump: Happy Birthday @DonaldJTrumpJr!\\nhttps://t.co/uRxyCD3hBz \n",
"1 Happy Birthday @DonaldJTrumpJr!\\nhttps://t.co/uRxyCD3hBz \n",
"2 Happy New Year to all, including to my many enemies and those who have fought me and lost so bad... \n",
"3 Russians are playing @CNN and @NBCNews for such fools - funny to watch, they don't have a clue! ... \n",
"4 Join @AmerIcan32, founded by Hall of Fame legend @JimBrownNFL32 on 1/19/2017 in Washington, D.C.... \n",
"5 Great move on delay (by V. Putin) - I always knew he was very smart! \n",
"6 My Administration will follow two simple rules: https://t.co/ZWk0j4H8Qy \n",
"7 'Economists say Trump delivered hope' https://t.co/SjGBgglIuQ \n",
"8 not anymore. The beginning of the end was the horrible Iran deal, and now this (U.N.)! Stay stro... \n",
"9 We cannot continue to let Israel be treated with such total disdain and disrespect. They used to... \n",
"\n",
" sentiment intensity \n",
"0 0.6114 0.444 \n",
"1 0.6114 0.571 \n",
"2 -0.4911 0.476 \n",
"3 0.2695 0.333 \n",
"4 0.6249 0.282 \n",
"5 0.7257 0.492 \n",
"6 0.0000 0.000 \n",
"7 0.4404 0.367 \n",
"8 -0.1984 0.251 \n",
"9 0.3400 0.398 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.set_option('max_colwidth', 100)\n",
"trump[['text', 'sentiment', 'intensity']].head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looks like it makes some sense finally.\n",
"\n",
"To warm up, we decided to display a couple of plots to see if there's anything interesting we can reveal. We use samples to make plots looking less dense"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"ax = trump.sample(1000).plot(x='created_at', y='sentiment', style='.')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x117502cc0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"ax = trump.sample(1000).plot(x='created_at', y='intensity', style='.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I guess, that shows what everybody knows already. The mood of the 45th President of America is fickle and can probably act as a good thing to seed your random number generator.\n",
"\n",
"With that saying, with so many tweets (and he generates about 8-10 tweets per day) it's too dificult to see the truth on the plot. Let's aggregate this to see the \"month average\"."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>favorite_count</th>\n",
" <th>id_str</th>\n",
" <th>in_reply_to_user_id_str</th>\n",
" <th>is_retweet</th>\n",
" <th>retweet_count</th>\n",
" <th>sentiment</th>\n",
" <th>intensity</th>\n",
" </tr>\n",
" <tr>\n",
" <th>month</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2016-01</th>\n",
" <td>5731.091463</td>\n",
" <td>6.889946e+17</td>\n",
" <td>NaN</td>\n",
" <td>0.026423</td>\n",
" <td>2152.260163</td>\n",
" <td>0.212208</td>\n",
" <td>0.246764</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2016-02</th>\n",
" <td>8208.722672</td>\n",
" <td>6.997745e+17</td>\n",
" <td>1.084716e+08</td>\n",
" <td>0.044534</td>\n",
" <td>3129.856275</td>\n",
" <td>0.163036</td>\n",
" <td>0.253002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2016-03</th>\n",
" <td>14815.274376</td>\n",
" <td>7.096324e+17</td>\n",
" <td>NaN</td>\n",
" <td>0.047619</td>\n",
" <td>5428.079365</td>\n",
" <td>0.158371</td>\n",
" <td>0.272186</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2016-04</th>\n",
" <td>13381.989437</td>\n",
" <td>7.212304e+17</td>\n",
" <td>NaN</td>\n",
" <td>0.028169</td>\n",
" <td>4762.964789</td>\n",
" <td>0.316074</td>\n",
" <td>0.262669</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2016-05</th>\n",
" <td>15340.686610</td>\n",
" <td>7.321916e+17</td>\n",
" <td>2.657642e+08</td>\n",
" <td>0.019943</td>\n",
" <td>5254.982906</td>\n",
" <td>0.211409</td>\n",
" <td>0.270205</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" favorite_count id_str in_reply_to_user_id_str is_retweet \\\n",
"month \n",
"2016-01 5731.091463 6.889946e+17 NaN 0.026423 \n",
"2016-02 8208.722672 6.997745e+17 1.084716e+08 0.044534 \n",
"2016-03 14815.274376 7.096324e+17 NaN 0.047619 \n",
"2016-04 13381.989437 7.212304e+17 NaN 0.028169 \n",
"2016-05 15340.686610 7.321916e+17 2.657642e+08 0.019943 \n",
"\n",
" retweet_count sentiment intensity \n",
"month \n",
"2016-01 2152.260163 0.212208 0.246764 \n",
"2016-02 3129.856275 0.163036 0.253002 \n",
"2016-03 5428.079365 0.158371 0.272186 \n",
"2016-04 4762.964789 0.316074 0.262669 \n",
"2016-05 5254.982906 0.211409 0.270205 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"trump['month'] = trump.created_at.apply(lambda dt: dt.strftime('%Y-%m'))\n",
"trump_monthly = trump.groupby('month').mean()\n",
"trump_monthly.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x116f8def0>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x1176aaac8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"fig, axes = plt.subplots(nrows=2)\n",
"ax0, ax1 = axes\n",
"\n",
"trump_monthly.sentiment.plot.bar(ax=ax0, color='k')\n",
"trump_monthly.intensity.plot.bar(ax=ax1, color='k')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The overall sentiment of messages is positive, and the intensity is about the same throughoght two years.\n",
"\n",
"Then we played a bit with the retweet count to see if there's any correlation between the date and the average number of retweets, or maybe sentiment or intensity can affect that number.\n",
"\n",
"At this point we kind of forgot already that we wanted to find a correlation between these tweets and bitcoin prices and were driven by pure interest towards the emotions of POTUS."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x116d746a0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"ax = trump.boxplot('retweet_count', by='month', showfliers=False)\n",
"_ = plt.xticks(rotation=90)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x116e88940>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"ax = trump.sample(1000).plot(x='intensity', y='retweet_count', style='.', logy=True)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x1176c9470>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"ax = trump.sample(1000).plot('sentiment', 'retweet_count', style='.', logy=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looks natural that the popularity of his Twitter has grown significantly around the second half of 2016, since the moment he was officially recognized as the main Republican candidate for the White House. Other than this, nothing specifically interesting.\n",
"\n",
"At this point it was totally obvious that will not find any correlation between Trump tweets and Bitcoin prices, but we decided to finish this up anway. \n",
"\n",
"We downloaded the list of BTC prices with [Coindesk API](https://www.coindesk.com/api/) and annotated every tweet with the price Bitcoin had that day."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import datetime\n",
"\n",
"\n",
"url = 'https://api.coindesk.com/v1/bpi/historical/close.json'\n",
"prices = requests.get(url, params={\n",
" 'start': '2015-01-01',\n",
" 'end': '2018-02-24',\n",
"}).json()['bpi']\n",
"\n",
"trump['btc_price'] = trump.created_at.apply(lambda dt: prices[dt.strftime('%Y-%m-%d')])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, to close down the question, we built a correlation matrix to see if there's any correlation between the values we had in our matrix."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>favorite_count</th>\n",
" <th>id_str</th>\n",
" <th>in_reply_to_user_id_str</th>\n",
" <th>is_retweet</th>\n",
" <th>retweet_count</th>\n",
" <th>sentiment</th>\n",
" <th>intensity</th>\n",
" <th>btc_price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>favorite_count</th>\n",
" <td>1.000000</td>\n",
" <td>0.591057</td>\n",
" <td>-0.430882</td>\n",
" <td>-0.268419</td>\n",
" <td>0.860525</td>\n",
" <td>-0.012141</td>\n",
" <td>0.098535</td>\n",
" <td>0.396370</td>\n",
" </tr>\n",
" <tr>\n",
" <th>id_str</th>\n",
" <td>0.591057</td>\n",
" <td>1.000000</td>\n",
" <td>-0.567039</td>\n",
" <td>0.144559</td>\n",
" <td>0.474631</td>\n",
" <td>0.029834</td>\n",
" <td>-0.003928</td>\n",
" <td>0.768578</td>\n",
" </tr>\n",
" <tr>\n",
" <th>in_reply_to_user_id_str</th>\n",
" <td>-0.430882</td>\n",
" <td>-0.567039</td>\n",
" <td>1.000000</td>\n",
" <td>NaN</td>\n",
" <td>-0.275194</td>\n",
" <td>-0.089284</td>\n",
" <td>-0.105844</td>\n",
" <td>-0.273243</td>\n",
" </tr>\n",
" <tr>\n",
" <th>is_retweet</th>\n",
" <td>-0.268419</td>\n",
" <td>0.144559</td>\n",
" <td>NaN</td>\n",
" <td>1.000000</td>\n",
" <td>-0.056263</td>\n",
" <td>-0.012187</td>\n",
" <td>-0.134873</td>\n",
" <td>0.097504</td>\n",
" </tr>\n",
" <tr>\n",
" <th>retweet_count</th>\n",
" <td>0.860525</td>\n",
" <td>0.474631</td>\n",
" <td>-0.275194</td>\n",
" <td>-0.056263</td>\n",
" <td>1.000000</td>\n",
" <td>-0.068394</td>\n",
" <td>0.045699</td>\n",
" <td>0.312220</td>\n",
" </tr>\n",
" <tr>\n",
" <th>sentiment</th>\n",
" <td>-0.012141</td>\n",
" <td>0.029834</td>\n",
" <td>-0.089284</td>\n",
" <td>-0.012187</td>\n",
" <td>-0.068394</td>\n",
" <td>1.000000</td>\n",
" <td>0.231508</td>\n",
" <td>0.043940</td>\n",
" </tr>\n",
" <tr>\n",
" <th>intensity</th>\n",
" <td>0.098535</td>\n",
" <td>-0.003928</td>\n",
" <td>-0.105844</td>\n",
" <td>-0.134873</td>\n",
" <td>0.045699</td>\n",
" <td>0.231508</td>\n",
" <td>1.000000</td>\n",
" <td>0.014950</td>\n",
" </tr>\n",
" <tr>\n",
" <th>btc_price</th>\n",
" <td>0.396370</td>\n",
" <td>0.768578</td>\n",
" <td>-0.273243</td>\n",
" <td>0.097504</td>\n",
" <td>0.312220</td>\n",
" <td>0.043940</td>\n",
" <td>0.014950</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" favorite_count id_str in_reply_to_user_id_str \\\n",
"favorite_count 1.000000 0.591057 -0.430882 \n",
"id_str 0.591057 1.000000 -0.567039 \n",
"in_reply_to_user_id_str -0.430882 -0.567039 1.000000 \n",
"is_retweet -0.268419 0.144559 NaN \n",
"retweet_count 0.860525 0.474631 -0.275194 \n",
"sentiment -0.012141 0.029834 -0.089284 \n",
"intensity 0.098535 -0.003928 -0.105844 \n",
"btc_price 0.396370 0.768578 -0.273243 \n",
"\n",
" is_retweet retweet_count sentiment intensity \\\n",
"favorite_count -0.268419 0.860525 -0.012141 0.098535 \n",
"id_str 0.144559 0.474631 0.029834 -0.003928 \n",
"in_reply_to_user_id_str NaN -0.275194 -0.089284 -0.105844 \n",
"is_retweet 1.000000 -0.056263 -0.012187 -0.134873 \n",
"retweet_count -0.056263 1.000000 -0.068394 0.045699 \n",
"sentiment -0.012187 -0.068394 1.000000 0.231508 \n",
"intensity -0.134873 0.045699 0.231508 1.000000 \n",
"btc_price 0.097504 0.312220 0.043940 0.014950 \n",
"\n",
" btc_price \n",
"favorite_count 0.396370 \n",
"id_str 0.768578 \n",
"in_reply_to_user_id_str -0.273243 \n",
"is_retweet 0.097504 \n",
"retweet_count 0.312220 \n",
"sentiment 0.043940 \n",
"intensity 0.014950 \n",
"btc_price 1.000000 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"trump.corr()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Well, as you can see, we found some interdependencies between variables. The \"0.86\" coefficient between \"favorite_count\" and \"retweet_count\" expresses the obvious fact that likes and retweets go hand in hand.\n",
"\n",
"As for the Bitcoin price, the closest predictor of it turned out to be a tweet ID. This doesn't come as a surprise at all. Both tweet IDs and the BTC price shows the tendency to grow steadily throughout 2016 and 2017. Needless to say, we were quite excited to finally find the main factor behind the Bitcoin price growth and decided to stop for this.\n",
"\n",
"\n",
"If you like the investigation we made, you may find much more even more stunning examples, made by like-minded scientists on the [Spurious Correlations website](http://www.tylervigen.com/spurious-correlations), the website which makes you think about the oddity of the world."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment