adityajn105 · April 26, 2019 07:28
diff --git a/hypothesis_testing_demo_datalit_week_2.ipynb b/hypothesis_testing_demo_datalit_week_2.ipynb
 {
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Hypothesis_Testing_Demo_DataLit_Week_2.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/adityajn105/1004491c3e5cde543890bada61665717/hypothesis_testing_demo_datalit_week_2.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "metadata": {
        "id": "wWNVqI9ImPfj",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## Hypothesis Testing Demo\n",
        "\n",
        "### School of AI - DataLit Week 2\n",
        "\n",
        "#### Any mistakes made here are the sole property of I, Carson Bentley"
      ]
    },
    {
      "metadata": {
        "id": "DwKEURLNkBGB",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "This notebook is based on an article provided by Gaël Varoquaux as part of the [scipy documentation](http://scipy-lectures.org/packages/statistics/index.html#hypothesis-testing-comparing-two-groups)."
      ]
    },
    {
      "metadata": {
        "id": "REyIYsJCLcc3",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "See the following for a description of how the data was collected:\n",
        "\n",
        "[Brain Size and Intelligence. Willerman et al. (1991)](https://www3.nd.edu/~busiforc/handouts/Data%20and%20Stories/correlation/Brain%20Size/brainsize.html)"
      ]
    },
    {
      "metadata": {
        "id": "_y8aM6Ow2kHV",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "import pandas as pd"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "BmaMZKG12Ej6",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "URL = 'http://scipy-lectures.org/_downloads/brain_size.csv'"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "jw9ml0n52F0F",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "df = pd.read_csv(URL, sep=';', na_values=\".\", index_col=0)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "0JH1qEG32ruO",
        "colab_type": "code",
        "outputId": "f751f301-b043-47fa-b916-e81dfc6b277d",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 407
        }
      },
      "cell_type": "code",
      "source": [
        "df.head(12)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Gender</th>\n",
              "      <th>FSIQ</th>\n",
              "      <th>VIQ</th>\n",
              "      <th>PIQ</th>\n",
              "      <th>Weight</th>\n",
              "      <th>Height</th>\n",
              "      <th>MRI_Count</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Female</td>\n",
              "      <td>133</td>\n",
              "      <td>132</td>\n",
              "      <td>124</td>\n",
              "      <td>118.0</td>\n",
              "      <td>64.5</td>\n",
              "      <td>816932</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Male</td>\n",
              "      <td>140</td>\n",
              "      <td>150</td>\n",
              "      <td>124</td>\n",
              "      <td>NaN</td>\n",
              "      <td>72.5</td>\n",
              "      <td>1001121</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>Male</td>\n",
              "      <td>139</td>\n",
              "      <td>123</td>\n",
              "      <td>150</td>\n",
              "      <td>143.0</td>\n",
              "      <td>73.3</td>\n",
              "      <td>1038437</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>Male</td>\n",
              "      <td>133</td>\n",
              "      <td>129</td>\n",
              "      <td>128</td>\n",
              "      <td>172.0</td>\n",
              "      <td>68.8</td>\n",
              "      <td>965353</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>Female</td>\n",
              "      <td>137</td>\n",
              "      <td>132</td>\n",
              "      <td>134</td>\n",
              "      <td>147.0</td>\n",
              "      <td>65.0</td>\n",
              "      <td>951545</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>Female</td>\n",
              "      <td>99</td>\n",
              "      <td>90</td>\n",
              "      <td>110</td>\n",
              "      <td>146.0</td>\n",
              "      <td>69.0</td>\n",
              "      <td>928799</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>Female</td>\n",
              "      <td>138</td>\n",
              "      <td>136</td>\n",
              "      <td>131</td>\n",
              "      <td>138.0</td>\n",
              "      <td>64.5</td>\n",
              "      <td>991305</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>Female</td>\n",
              "      <td>92</td>\n",
              "      <td>90</td>\n",
              "      <td>98</td>\n",
              "      <td>175.0</td>\n",
              "      <td>66.0</td>\n",
              "      <td>854258</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>Male</td>\n",
              "      <td>89</td>\n",
              "      <td>93</td>\n",
              "      <td>84</td>\n",
              "      <td>134.0</td>\n",
              "      <td>66.3</td>\n",
              "      <td>904858</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>10</th>\n",
              "      <td>Male</td>\n",
              "      <td>133</td>\n",
              "      <td>114</td>\n",
              "      <td>147</td>\n",
              "      <td>172.0</td>\n",
              "      <td>68.8</td>\n",
              "      <td>955466</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>Female</td>\n",
              "      <td>132</td>\n",
              "      <td>129</td>\n",
              "      <td>124</td>\n",
              "      <td>118.0</td>\n",
              "      <td>64.5</td>\n",
              "      <td>833868</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>12</th>\n",
              "      <td>Male</td>\n",
              "      <td>141</td>\n",
              "      <td>150</td>\n",
              "      <td>128</td>\n",
              "      <td>151.0</td>\n",
              "      <td>70.0</td>\n",
              "      <td>1079549</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "    Gender  FSIQ  VIQ  PIQ  Weight  Height  MRI_Count\n",
              "1   Female   133  132  124   118.0    64.5     816932\n",
              "2     Male   140  150  124     NaN    72.5    1001121\n",
              "3     Male   139  123  150   143.0    73.3    1038437\n",
              "4     Male   133  129  128   172.0    68.8     965353\n",
              "5   Female   137  132  134   147.0    65.0     951545\n",
              "6   Female    99   90  110   146.0    69.0     928799\n",
              "7   Female   138  136  131   138.0    64.5     991305\n",
              "8   Female    92   90   98   175.0    66.0     854258\n",
              "9     Male    89   93   84   134.0    66.3     904858\n",
              "10    Male   133  114  147   172.0    68.8     955466\n",
              "11  Female   132  129  124   118.0    64.5     833868\n",
              "12    Male   141  150  128   151.0    70.0    1079549"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 23
        }
      ]
    },
    {
      "metadata": {
        "id": "vMOb2RmA2sal",
        "colab_type": "code",
        "outputId": "556de9a0-2cb0-4f17-c31a-8859cead749c",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 287
        }
      },
      "cell_type": "code",
      "source": [
        "df.describe()"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>FSIQ</th>\n",
              "      <th>VIQ</th>\n",
              "      <th>PIQ</th>\n",
              "      <th>Weight</th>\n",
              "      <th>Height</th>\n",
              "      <th>MRI_Count</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>count</th>\n",
              "      <td>40.000000</td>\n",
              "      <td>40.000000</td>\n",
              "      <td>40.00000</td>\n",
              "      <td>38.000000</td>\n",
              "      <td>39.000000</td>\n",
              "      <td>4.000000e+01</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>mean</th>\n",
              "      <td>113.450000</td>\n",
              "      <td>112.350000</td>\n",
              "      <td>111.02500</td>\n",
              "      <td>151.052632</td>\n",
              "      <td>68.525641</td>\n",
              "      <td>9.087550e+05</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>std</th>\n",
              "      <td>24.082071</td>\n",
              "      <td>23.616107</td>\n",
              "      <td>22.47105</td>\n",
              "      <td>23.478509</td>\n",
              "      <td>3.994649</td>\n",
              "      <td>7.228205e+04</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>min</th>\n",
              "      <td>77.000000</td>\n",
              "      <td>71.000000</td>\n",
              "      <td>72.00000</td>\n",
              "      <td>106.000000</td>\n",
              "      <td>62.000000</td>\n",
              "      <td>7.906190e+05</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25%</th>\n",
              "      <td>89.750000</td>\n",
              "      <td>90.000000</td>\n",
              "      <td>88.25000</td>\n",
              "      <td>135.250000</td>\n",
              "      <td>66.000000</td>\n",
              "      <td>8.559185e+05</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>50%</th>\n",
              "      <td>116.500000</td>\n",
              "      <td>113.000000</td>\n",
              "      <td>115.00000</td>\n",
              "      <td>146.500000</td>\n",
              "      <td>68.000000</td>\n",
              "      <td>9.053990e+05</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>75%</th>\n",
              "      <td>135.500000</td>\n",
              "      <td>129.750000</td>\n",
              "      <td>128.00000</td>\n",
              "      <td>172.000000</td>\n",
              "      <td>70.500000</td>\n",
              "      <td>9.500780e+05</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>max</th>\n",
              "      <td>144.000000</td>\n",
              "      <td>150.000000</td>\n",
              "      <td>150.00000</td>\n",
              "      <td>192.000000</td>\n",
              "      <td>77.000000</td>\n",
              "      <td>1.079549e+06</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "             FSIQ         VIQ        PIQ      Weight     Height     MRI_Count\n",
              "count   40.000000   40.000000   40.00000   38.000000  39.000000  4.000000e+01\n",
              "mean   113.450000  112.350000  111.02500  151.052632  68.525641  9.087550e+05\n",
              "std     24.082071   23.616107   22.47105   23.478509   3.994649  7.228205e+04\n",
              "min     77.000000   71.000000   72.00000  106.000000  62.000000  7.906190e+05\n",
              "25%     89.750000   90.000000   88.25000  135.250000  66.000000  8.559185e+05\n",
              "50%    116.500000  113.000000  115.00000  146.500000  68.000000  9.053990e+05\n",
              "75%    135.500000  129.750000  128.00000  172.000000  70.500000  9.500780e+05\n",
              "max    144.000000  150.000000  150.00000  192.000000  77.000000  1.079549e+06"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 12
        }
      ]
    },
    {
      "metadata": {
        "id": "ZDFRr3ARDYXd",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "from scipy import stats"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "99VbahxAQDYw",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Check out the source code for scipy.stats [here](https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py). Under the hood, scipy uses numpy for the math calculations."
      ]
    },
    {
      "metadata": {
        "id": "u9GCZu0mQyEO",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "### One Sample T-Test"
      ]
    },
    {
      "metadata": {
        "id": "C27UjHZyGqH1",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "I'm curious if the averages given by this sample vary from the standard average IQ, which I happen to know is 100. In this experiment the null hypothesis is that the population from which this sample is drawn is actually 100, and the alternative hypothesis is that it is not.\n",
        "\n",
        "Let's use 5% as our significance, alpha"
      ]
    },
    {
      "metadata": {
        "id": "xs5pNzQo26_t",
        "colab_type": "code",
        "outputId": "65eea3bd-4e90-40b3-9fce-11101a571680",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 70
        }
      },
      "cell_type": "code",
      "source": [
        "IQ_column_names = ['FSIQ', 'VIQ', 'PIQ']\n",
        "\n",
        "for IQ_column in IQ_column_names:\n",
        "  print(stats.ttest_1samp(df[IQ_column], 100))"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Ttest_1sampResult(statistic=3.532307014238269, pvalue=0.0010766792736967715)\n",
            "Ttest_1sampResult(statistic=3.3074146385401786, pvalue=0.002030117404781822)\n",
            "Ttest_1sampResult(statistic=3.1030246997178783, pvalue=0.0035555593418294417)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "1o0O9d_dI8om",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Since the p-value is smaller than alpha, we can confidently reject the null hypothesis. That means that our average IQs of 113, 112, and 111 (for the FSIQ, VIQ, and PIQ) are most likely due to something other than random variation.\n",
        "\n",
        "Speculating as to why these IQs are above average, I imagined a scenario in which subjects are being gathered for a data collection at a university. Many of these subjects would naturally be students.\n",
        "\n",
        "As it turns out, this speculation was correct, as you can confirm by looking at the article linked at the top of this notebook."
      ]
    },
    {
      "metadata": {
        "id": "uuU4nxGdQ9NT",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "### Two Sample T-Test"
      ]
    },
    {
      "metadata": {
        "id": "alt_xNJzc5Iu",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Suppose we want to compare the IQs of men and women."
      ]
    },
    {
      "metadata": {
        "id": "BOLNeiN9drMG",
        "colab_type": "code",
        "outputId": "747f342e-16bf-42f6-a2b6-931014b487ab",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 123
        }
      },
      "cell_type": "code",
      "source": [
        "groupby_gender = df.groupby('Gender')\n",
        "for IQ_column in IQ_column_names:\n",
        "  for gender, value in groupby_gender[IQ_column]:\n",
        "    print((gender, value.mean()))"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "('Female', 111.9)\n",
            "('Male', 115.0)\n",
            "('Female', 109.45)\n",
            "('Male', 115.25)\n",
            "('Female', 110.45)\n",
            "('Male', 111.6)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "wwi7RZ4beaCn",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Although the males in this sample have higher average IQs in the three areas tested, these results are not necessarily significant.\n",
        "\n",
        "Let's do a two sample t-test to see if any of these differences meet a 5% standard alpha."
      ]
    },
    {
      "metadata": {
        "id": "yG3Bfa9nURwC",
        "colab_type": "code",
        "outputId": "bb7c604c-7235-4f9b-c1f9-5696e42d799a",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 70
        }
      },
      "cell_type": "code",
      "source": [
        "female = df[df['Gender'] == 'Female']\n",
        "male = df[df['Gender'] == 'Male']\n",
        "for IQ_column in IQ_column_names:\n",
        "  print(stats.ttest_ind(female[IQ_column], male[IQ_column]))"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Ttest_indResult(statistic=-0.4026724743703011, pvalue=0.6894456253897778)\n",
            "Ttest_indResult(statistic=-0.7726161723275011, pvalue=0.44452876778583217)\n",
            "Ttest_indResult(statistic=-0.15980113150762698, pvalue=0.8738841403250049)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "WPQ94WdMfWlD",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Since all three p-values are greater than 5%, we fail to reject the null hypothesis in each case. In other words the difference in mean IQ values for the observed data is not statistically significant."
      ]
    }
  ]
 }
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "Hypothesis_Testing_Demo_DataLit_Week_2.ipynb",
	"version": "0.3.2",
	"provenance": [],
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/adityajn105/1004491c3e5cde543890bada61665717/hypothesis_testing_demo_datalit_week_2.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"metadata": {
	"id": "wWNVqI9ImPfj",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"## Hypothesis Testing Demo\n",
	"\n",
	"### School of AI - DataLit Week 2\n",
	"\n",
	"#### Any mistakes made here are the sole property of I, Carson Bentley"
	]
	},
	{
	"metadata": {
	"id": "DwKEURLNkBGB",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"This notebook is based on an article provided by Gaël Varoquaux as part of the [scipy documentation](http://scipy-lectures.org/packages/statistics/index.html#hypothesis-testing-comparing-two-groups)."
	]
	},
	{
	"metadata": {
	"id": "REyIYsJCLcc3",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"See the following for a description of how the data was collected:\n",
	"\n",
	"[Brain Size and Intelligence. Willerman et al. (1991)](https://www3.nd.edu/~busiforc/handouts/Data%20and%20Stories/correlation/Brain%20Size/brainsize.html)"
	]
	},
	{
	"metadata": {
	"id": "_y8aM6Ow2kHV",
	"colab_type": "code",
	"colab": {}
	},
	"cell_type": "code",
	"source": [
	"import pandas as pd"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"metadata": {
	"id": "BmaMZKG12Ej6",
	"colab_type": "code",
	"colab": {}
	},
	"cell_type": "code",
	"source": [
	"URL = 'http://scipy-lectures.org/_downloads/brain_size.csv'"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"metadata": {
	"id": "jw9ml0n52F0F",
	"colab_type": "code",
	"colab": {}
	},
	"cell_type": "code",
	"source": [
	"df = pd.read_csv(URL, sep=';', na_values=\".\", index_col=0)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"metadata": {
	"id": "0JH1qEG32ruO",
	"colab_type": "code",
	"outputId": "f751f301-b043-47fa-b916-e81dfc6b277d",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 407
	}
	},
	"cell_type": "code",
	"source": [
	"df.head(12)"
	],
	"execution_count": 0,
	"outputs": [
	{
	"output_type": "execute_result",
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Gender</th>\n",
	" <th>FSIQ</th>\n",
	" <th>VIQ</th>\n",
	" <th>PIQ</th>\n",
	" <th>Weight</th>\n",
	" <th>Height</th>\n",
	" <th>MRI_Count</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>1</th>\n",
	" <td>Female</td>\n",
	" <td>133</td>\n",
	" <td>132</td>\n",
	" <td>124</td>\n",
	" <td>118.0</td>\n",
	" <td>64.5</td>\n",
	" <td>816932</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td>Male</td>\n",
	" <td>140</td>\n",
	" <td>150</td>\n",
	" <td>124</td>\n",
	" <td>NaN</td>\n",
	" <td>72.5</td>\n",
	" <td>1001121</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>3</th>\n",
	" <td>Male</td>\n",
	" <td>139</td>\n",
	" <td>123</td>\n",
	" <td>150</td>\n",
	" <td>143.0</td>\n",
	" <td>73.3</td>\n",
	" <td>1038437</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>4</th>\n",
	" <td>Male</td>\n",
	" <td>133</td>\n",
	" <td>129</td>\n",
	" <td>128</td>\n",
	" <td>172.0</td>\n",
	" <td>68.8</td>\n",
	" <td>965353</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>5</th>\n",
	" <td>Female</td>\n",
	" <td>137</td>\n",
	" <td>132</td>\n",
	" <td>134</td>\n",
	" <td>147.0</td>\n",
	" <td>65.0</td>\n",
	" <td>951545</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>6</th>\n",
	" <td>Female</td>\n",
	" <td>99</td>\n",
	" <td>90</td>\n",
	" <td>110</td>\n",
	" <td>146.0</td>\n",
	" <td>69.0</td>\n",
	" <td>928799</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>7</th>\n",
	" <td>Female</td>\n",
	" <td>138</td>\n",
	" <td>136</td>\n",
	" <td>131</td>\n",
	" <td>138.0</td>\n",
	" <td>64.5</td>\n",
	" <td>991305</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>8</th>\n",
	" <td>Female</td>\n",
	" <td>92</td>\n",
	" <td>90</td>\n",
	" <td>98</td>\n",
	" <td>175.0</td>\n",
	" <td>66.0</td>\n",
	" <td>854258</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>9</th>\n",
	" <td>Male</td>\n",
	" <td>89</td>\n",
	" <td>93</td>\n",
	" <td>84</td>\n",
	" <td>134.0</td>\n",
	" <td>66.3</td>\n",
	" <td>904858</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>10</th>\n",
	" <td>Male</td>\n",
	" <td>133</td>\n",
	" <td>114</td>\n",
	" <td>147</td>\n",
	" <td>172.0</td>\n",
	" <td>68.8</td>\n",
	" <td>955466</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>11</th>\n",
	" <td>Female</td>\n",
	" <td>132</td>\n",
	" <td>129</td>\n",
	" <td>124</td>\n",
	" <td>118.0</td>\n",
	" <td>64.5</td>\n",
	" <td>833868</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>12</th>\n",
	" <td>Male</td>\n",
	" <td>141</td>\n",
	" <td>150</td>\n",
	" <td>128</td>\n",
	" <td>151.0</td>\n",
	" <td>70.0</td>\n",
	" <td>1079549</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Gender FSIQ VIQ PIQ Weight Height MRI_Count\n",
	"1 Female 133 132 124 118.0 64.5 816932\n",
	"2 Male 140 150 124 NaN 72.5 1001121\n",
	"3 Male 139 123 150 143.0 73.3 1038437\n",
	"4 Male 133 129 128 172.0 68.8 965353\n",
	"5 Female 137 132 134 147.0 65.0 951545\n",
	"6 Female 99 90 110 146.0 69.0 928799\n",
	"7 Female 138 136 131 138.0 64.5 991305\n",
	"8 Female 92 90 98 175.0 66.0 854258\n",
	"9 Male 89 93 84 134.0 66.3 904858\n",
	"10 Male 133 114 147 172.0 68.8 955466\n",
	"11 Female 132 129 124 118.0 64.5 833868\n",
	"12 Male 141 150 128 151.0 70.0 1079549"
	]
	},
	"metadata": {
	"tags": []
	},
	"execution_count": 23
	}
	]
	},
	{
	"metadata": {
	"id": "vMOb2RmA2sal",
	"colab_type": "code",
	"outputId": "556de9a0-2cb0-4f17-c31a-8859cead749c",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 287
	}
	},
	"cell_type": "code",
	"source": [
	"df.describe()"
	],
	"execution_count": 0,
	"outputs": [
	{
	"output_type": "execute_result",
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>FSIQ</th>\n",
	" <th>VIQ</th>\n",
	" <th>PIQ</th>\n",
	" <th>Weight</th>\n",
	" <th>Height</th>\n",
	" <th>MRI_Count</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>count</th>\n",
	" <td>40.000000</td>\n",
	" <td>40.000000</td>\n",
	" <td>40.00000</td>\n",
	" <td>38.000000</td>\n",
	" <td>39.000000</td>\n",
	" <td>4.000000e+01</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>mean</th>\n",
	" <td>113.450000</td>\n",
	" <td>112.350000</td>\n",
	" <td>111.02500</td>\n",
	" <td>151.052632</td>\n",
	" <td>68.525641</td>\n",
	" <td>9.087550e+05</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>std</th>\n",
	" <td>24.082071</td>\n",
	" <td>23.616107</td>\n",
	" <td>22.47105</td>\n",
	" <td>23.478509</td>\n",
	" <td>3.994649</td>\n",
	" <td>7.228205e+04</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>min</th>\n",
	" <td>77.000000</td>\n",
	" <td>71.000000</td>\n",
	" <td>72.00000</td>\n",
	" <td>106.000000</td>\n",
	" <td>62.000000</td>\n",
	" <td>7.906190e+05</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>25%</th>\n",
	" <td>89.750000</td>\n",
	" <td>90.000000</td>\n",
	" <td>88.25000</td>\n",
	" <td>135.250000</td>\n",
	" <td>66.000000</td>\n",
	" <td>8.559185e+05</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>50%</th>\n",
	" <td>116.500000</td>\n",
	" <td>113.000000</td>\n",
	" <td>115.00000</td>\n",
	" <td>146.500000</td>\n",
	" <td>68.000000</td>\n",
	" <td>9.053990e+05</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>75%</th>\n",
	" <td>135.500000</td>\n",
	" <td>129.750000</td>\n",
	" <td>128.00000</td>\n",
	" <td>172.000000</td>\n",
	" <td>70.500000</td>\n",
	" <td>9.500780e+05</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>max</th>\n",
	" <td>144.000000</td>\n",
	" <td>150.000000</td>\n",
	" <td>150.00000</td>\n",
	" <td>192.000000</td>\n",
	" <td>77.000000</td>\n",
	" <td>1.079549e+06</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" FSIQ VIQ PIQ Weight Height MRI_Count\n",
	"count 40.000000 40.000000 40.00000 38.000000 39.000000 4.000000e+01\n",
	"mean 113.450000 112.350000 111.02500 151.052632 68.525641 9.087550e+05\n",
	"std 24.082071 23.616107 22.47105 23.478509 3.994649 7.228205e+04\n",
	"min 77.000000 71.000000 72.00000 106.000000 62.000000 7.906190e+05\n",
	"25% 89.750000 90.000000 88.25000 135.250000 66.000000 8.559185e+05\n",
	"50% 116.500000 113.000000 115.00000 146.500000 68.000000 9.053990e+05\n",
	"75% 135.500000 129.750000 128.00000 172.000000 70.500000 9.500780e+05\n",
	"max 144.000000 150.000000 150.00000 192.000000 77.000000 1.079549e+06"
	]
	},
	"metadata": {
	"tags": []
	},
	"execution_count": 12
	}
	]
	},
	{
	"metadata": {
	"id": "ZDFRr3ARDYXd",
	"colab_type": "code",
	"colab": {}
	},
	"cell_type": "code",
	"source": [
	"from scipy import stats"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"metadata": {
	"id": "99VbahxAQDYw",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"Check out the source code for scipy.stats [here](https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py). Under the hood, scipy uses numpy for the math calculations."
	]
	},
	{
	"metadata": {
	"id": "u9GCZu0mQyEO",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"### One Sample T-Test"
	]
	},
	{
	"metadata": {
	"id": "C27UjHZyGqH1",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"I'm curious if the averages given by this sample vary from the standard average IQ, which I happen to know is 100. In this experiment the null hypothesis is that the population from which this sample is drawn is actually 100, and the alternative hypothesis is that it is not.\n",
	"\n",
	"Let's use 5% as our significance, alpha"
	]
	},
	{
	"metadata": {
	"id": "xs5pNzQo26_t",
	"colab_type": "code",
	"outputId": "65eea3bd-4e90-40b3-9fce-11101a571680",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 70
	}
	},
	"cell_type": "code",
	"source": [
	"IQ_column_names = ['FSIQ', 'VIQ', 'PIQ']\n",
	"\n",
	"for IQ_column in IQ_column_names:\n",
	" print(stats.ttest_1samp(df[IQ_column], 100))"
	],
	"execution_count": 0,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"Ttest_1sampResult(statistic=3.532307014238269, pvalue=0.0010766792736967715)\n",
	"Ttest_1sampResult(statistic=3.3074146385401786, pvalue=0.002030117404781822)\n",
	"Ttest_1sampResult(statistic=3.1030246997178783, pvalue=0.0035555593418294417)\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"metadata": {
	"id": "1o0O9d_dI8om",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"Since the p-value is smaller than alpha, we can confidently reject the null hypothesis. That means that our average IQs of 113, 112, and 111 (for the FSIQ, VIQ, and PIQ) are most likely due to something other than random variation.\n",
	"\n",
	"Speculating as to why these IQs are above average, I imagined a scenario in which subjects are being gathered for a data collection at a university. Many of these subjects would naturally be students.\n",
	"\n",
	"As it turns out, this speculation was correct, as you can confirm by looking at the article linked at the top of this notebook."
	]
	},
	{
	"metadata": {
	"id": "uuU4nxGdQ9NT",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"### Two Sample T-Test"
	]
	},
	{
	"metadata": {
	"id": "alt_xNJzc5Iu",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"Suppose we want to compare the IQs of men and women."
	]
	},
	{
	"metadata": {
	"id": "BOLNeiN9drMG",
	"colab_type": "code",
	"outputId": "747f342e-16bf-42f6-a2b6-931014b487ab",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 123
	}
	},
	"cell_type": "code",
	"source": [
	"groupby_gender = df.groupby('Gender')\n",
	"for IQ_column in IQ_column_names:\n",
	" for gender, value in groupby_gender[IQ_column]:\n",
	" print((gender, value.mean()))"
	],
	"execution_count": 0,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"('Female', 111.9)\n",
	"('Male', 115.0)\n",
	"('Female', 109.45)\n",
	"('Male', 115.25)\n",
	"('Female', 110.45)\n",
	"('Male', 111.6)\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"metadata": {
	"id": "wwi7RZ4beaCn",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"Although the males in this sample have higher average IQs in the three areas tested, these results are not necessarily significant.\n",
	"\n",
	"Let's do a two sample t-test to see if any of these differences meet a 5% standard alpha."
	]
	},
	{
	"metadata": {
	"id": "yG3Bfa9nURwC",
	"colab_type": "code",
	"outputId": "bb7c604c-7235-4f9b-c1f9-5696e42d799a",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 70
	}
	},
	"cell_type": "code",
	"source": [
	"female = df[df['Gender'] == 'Female']\n",
	"male = df[df['Gender'] == 'Male']\n",
	"for IQ_column in IQ_column_names:\n",
	" print(stats.ttest_ind(female[IQ_column], male[IQ_column]))"
	],
	"execution_count": 0,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"Ttest_indResult(statistic=-0.4026724743703011, pvalue=0.6894456253897778)\n",
	"Ttest_indResult(statistic=-0.7726161723275011, pvalue=0.44452876778583217)\n",
	"Ttest_indResult(statistic=-0.15980113150762698, pvalue=0.8738841403250049)\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"metadata": {
	"id": "WPQ94WdMfWlD",
	"colab_type": "text"
	},
	"cell_type": "markdown",
	"source": [
	"Since all three p-values are greater than 5%, we fail to reject the null hypothesis in each case. In other words the difference in mean IQ values for the observed data is not statistically significant."
	]
	}
	]
	}