Skip to content

Instantly share code, notes, and snippets.

@d-tork
Created June 11, 2019 15:51
Show Gist options
  • Save d-tork/3d80934cd8d8f1b4e0a761eb4898f679 to your computer and use it in GitHub Desktop.
Save d-tork/3d80934cd8d8f1b4e0a761eb4898f679 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Stats 101\n",
"## Correlation and Covariance\n",
"### Covariance"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import math"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Economic Growth (%Xi) S&P 500 Returns (%Yi)\n",
"0 2.3 8\n",
"1 2.5 9\n",
"2 3.6 13\n"
]
}
],
"source": [
"df = pd.DataFrame([[2.3, 8], [2.5, 9], [3.6, 13]], columns=['Economic Growth (%Xi)', 'S&P 500 Returns (%Yi)'])\n",
"print(df)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Economic Growth (%Xi) 2.8\n",
"S&P 500 Returns (%Yi) 10.0\n",
"dtype: float64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Find the sample mean for x and y\n",
"df.mean()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Economic Growth (%Xi) S&P 500 Returns (%Yi) x meandif y meandif\n",
"0 2.3 8 -0.5 -2.0\n",
"1 2.5 9 -0.3 -1.0\n",
"2 3.6 13 0.8 3.0\n"
]
}
],
"source": [
"# For each x value, find x - the mean of x\n",
"df['x meandif'] = df.iloc[:,0] - np.mean(df.iloc[:,0])\n",
"\n",
"# For each y value, find y - the mean of y\n",
"df['y meandif'] = df.iloc[:,1] - np.mean(df.iloc[:,1])\n",
"print(df)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Economic Growth (%Xi) S&P 500 Returns (%Yi) x meandif y meandif mult\n",
"0 2.3 8 -0.5 -2.0 1.0\n",
"1 2.5 9 -0.3 -1.0 0.3\n",
"2 3.6 13 0.8 3.0 2.4\n"
]
}
],
"source": [
"# Multiply the first x mean value with the first y mean value, etc.\n",
"df['mult'] = df['x meandif'] * df['y meandif']\n",
"print(df)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.8500000000000003"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Sum the output of the values from step 4 and divide by n-1\n",
"covar = df['mult'].sum()/(3-1)\n",
"covar"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Covariance** measures the strength and direction of the **linear relationship** between _X_ and _Y_."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Correlation"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3. Sum of the mean differences multiplied:\n",
"3.7000000000000006\n",
" Economic Growth (%Xi) S&P 500 Returns (%Yi) x meandif y meandif mult \\\n",
"0 2.3 8 -0.5 -2.0 1.0 \n",
"1 2.5 9 -0.3 -1.0 0.3 \n",
"2 3.6 13 0.8 3.0 2.4 \n",
"\n",
" x meandif_sq y meandif_sq \n",
"0 0.25 4.0 \n",
"1 0.09 1.0 \n",
"2 0.64 9.0 \n",
"Sums of mean differences squared\n",
"4. X: 0.9800000000000003\n",
"5. Y: 14.0\n",
"\n",
"6. Multiply ^ those: 13.720000000000004\n",
"\n",
"7. Square root of ^ that: 3.7040518354904273\n",
"\n",
"8. Divide Step 3 by Step 7:\n",
"0.998906107238672\n"
]
}
],
"source": [
"sum_meanmult = df['mult'].sum()\n",
"print('3. Sum of the mean differences multiplied:\\n{}'.format(sum_meanmult))\n",
"\n",
"df['x meandif_sq'] = df['x meandif']**2\n",
"df['y meandif_sq'] = df['y meandif']**2\n",
"print(df)\n",
"\n",
"sum_xmeandif_sq = df['x meandif_sq'].sum()\n",
"sum_ymeandif_sq = df['y meandif_sq'].sum()\n",
"print('Sums of mean differences squared')\n",
"print('4. X: {}'.format(sum_xmeandif_sq))\n",
"print('5. Y: {}'.format(sum_ymeandif_sq))\n",
"\n",
"mult_meandifsq = sum_xmeandif_sq * sum_ymeandif_sq\n",
"print('\\n6. Multiply ^ those: {}'.format(mult_meandifsq))\n",
"\n",
"sqrt_multmeandifsq = math.sqrt(mult_meandifsq)\n",
"print('\\n7. Square root of ^ that: {}'.format(sqrt_multmeandifsq))\n",
"\n",
"divisionbwah = sum_meanmult / sqrt_multmeandifsq\n",
"print('\\n8. Divide Step 3 by Step 7:\\n{}'.format(divisionbwah))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Pearson's r<sub>XY</sub> (correlation coefficient)** measures the strength and direction of the **linear relationship** between _X_ and _Y_, but it is bounded between -1 and 1."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment