Created
June 11, 2019 15:51
-
-
Save d-tork/3d80934cd8d8f1b4e0a761eb4898f679 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Stats 101\n", | |
"## Correlation and Covariance\n", | |
"### Covariance" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import pandas as pd\n", | |
"import numpy as np\n", | |
"import math" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" Economic Growth (%Xi) S&P 500 Returns (%Yi)\n", | |
"0 2.3 8\n", | |
"1 2.5 9\n", | |
"2 3.6 13\n" | |
] | |
} | |
], | |
"source": [ | |
"df = pd.DataFrame([[2.3, 8], [2.5, 9], [3.6, 13]], columns=['Economic Growth (%Xi)', 'S&P 500 Returns (%Yi)'])\n", | |
"print(df)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Economic Growth (%Xi) 2.8\n", | |
"S&P 500 Returns (%Yi) 10.0\n", | |
"dtype: float64" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Find the sample mean for x and y\n", | |
"df.mean()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" Economic Growth (%Xi) S&P 500 Returns (%Yi) x meandif y meandif\n", | |
"0 2.3 8 -0.5 -2.0\n", | |
"1 2.5 9 -0.3 -1.0\n", | |
"2 3.6 13 0.8 3.0\n" | |
] | |
} | |
], | |
"source": [ | |
"# For each x value, find x - the mean of x\n", | |
"df['x meandif'] = df.iloc[:,0] - np.mean(df.iloc[:,0])\n", | |
"\n", | |
"# For each y value, find y - the mean of y\n", | |
"df['y meandif'] = df.iloc[:,1] - np.mean(df.iloc[:,1])\n", | |
"print(df)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" Economic Growth (%Xi) S&P 500 Returns (%Yi) x meandif y meandif mult\n", | |
"0 2.3 8 -0.5 -2.0 1.0\n", | |
"1 2.5 9 -0.3 -1.0 0.3\n", | |
"2 3.6 13 0.8 3.0 2.4\n" | |
] | |
} | |
], | |
"source": [ | |
"# Multiply the first x mean value with the first y mean value, etc.\n", | |
"df['mult'] = df['x meandif'] * df['y meandif']\n", | |
"print(df)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1.8500000000000003" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Sum the output of the values from step 4 and divide by n-1\n", | |
"covar = df['mult'].sum()/(3-1)\n", | |
"covar" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Covariance** measures the strength and direction of the **linear relationship** between _X_ and _Y_." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Correlation" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"3. Sum of the mean differences multiplied:\n", | |
"3.7000000000000006\n", | |
" Economic Growth (%Xi) S&P 500 Returns (%Yi) x meandif y meandif mult \\\n", | |
"0 2.3 8 -0.5 -2.0 1.0 \n", | |
"1 2.5 9 -0.3 -1.0 0.3 \n", | |
"2 3.6 13 0.8 3.0 2.4 \n", | |
"\n", | |
" x meandif_sq y meandif_sq \n", | |
"0 0.25 4.0 \n", | |
"1 0.09 1.0 \n", | |
"2 0.64 9.0 \n", | |
"Sums of mean differences squared\n", | |
"4. X: 0.9800000000000003\n", | |
"5. Y: 14.0\n", | |
"\n", | |
"6. Multiply ^ those: 13.720000000000004\n", | |
"\n", | |
"7. Square root of ^ that: 3.7040518354904273\n", | |
"\n", | |
"8. Divide Step 3 by Step 7:\n", | |
"0.998906107238672\n" | |
] | |
} | |
], | |
"source": [ | |
"sum_meanmult = df['mult'].sum()\n", | |
"print('3. Sum of the mean differences multiplied:\\n{}'.format(sum_meanmult))\n", | |
"\n", | |
"df['x meandif_sq'] = df['x meandif']**2\n", | |
"df['y meandif_sq'] = df['y meandif']**2\n", | |
"print(df)\n", | |
"\n", | |
"sum_xmeandif_sq = df['x meandif_sq'].sum()\n", | |
"sum_ymeandif_sq = df['y meandif_sq'].sum()\n", | |
"print('Sums of mean differences squared')\n", | |
"print('4. X: {}'.format(sum_xmeandif_sq))\n", | |
"print('5. Y: {}'.format(sum_ymeandif_sq))\n", | |
"\n", | |
"mult_meandifsq = sum_xmeandif_sq * sum_ymeandif_sq\n", | |
"print('\\n6. Multiply ^ those: {}'.format(mult_meandifsq))\n", | |
"\n", | |
"sqrt_multmeandifsq = math.sqrt(mult_meandifsq)\n", | |
"print('\\n7. Square root of ^ that: {}'.format(sqrt_multmeandifsq))\n", | |
"\n", | |
"divisionbwah = sum_meanmult / sqrt_multmeandifsq\n", | |
"print('\\n8. Divide Step 3 by Step 7:\\n{}'.format(divisionbwah))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Pearson's r<sub>XY</sub> (correlation coefficient)** measures the strength and direction of the **linear relationship** between _X_ and _Y_, but it is bounded between -1 and 1." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.8" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment