Created
November 14, 2020 17:03
-
-
Save eray995/558711c66408e57abdc85540e468928f to your computer and use it in GitHub Desktop.
Created on Skills Network Labs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<center>\n", | |
" <img src=\"https://gitlab.com/ibm/skills-network/courses/placeholder101/-/raw/master/labs/module%201/images/IDSNlogo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\" />\n", | |
"</center>\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# **Introduction to Probability Distribution**\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Estimated time needed: **30** minutes\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In this lab, you will familiarize yourself with the normal probability distributions and work on some exercises\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Objectives\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- Import Libraries\n", | |
"- Introduction to Probability Distributions\n", | |
" - Normal Distributions\n", | |
"- Lab Exercises\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"* * *\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Import Libraries\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"All Libraries required for this lab are listed below. The libraries pre-installed on Skills Network Labs are commented. If you run this notebook in a different environment, e.g. your desktop, you may need to uncomment and install certain libraries.\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Requirement already satisfied: pandas in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (1.1.3)\n", | |
"Requirement already satisfied: numpy>=1.15.4 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from pandas) (1.19.2)\n", | |
"Requirement already satisfied: python-dateutil>=2.7.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from pandas) (2.8.1)\n", | |
"Requirement already satisfied: pytz>=2017.2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from pandas) (2020.1)\n", | |
"Requirement already satisfied: six>=1.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)\n", | |
"Requirement already satisfied: numpy in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (1.19.2)\n", | |
"Requirement already satisfied: matplotlib in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (3.3.2)\n", | |
"Requirement already satisfied: python-dateutil>=2.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib) (2.8.1)\n", | |
"Requirement already satisfied: cycler>=0.10 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib) (0.10.0)\n", | |
"Requirement already satisfied: kiwisolver>=1.0.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib) (1.2.0)\n", | |
"Requirement already satisfied: certifi>=2020.06.20 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib) (2020.6.20)\n", | |
"Requirement already satisfied: pillow>=6.2.0 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib) (8.0.1)\n", | |
"Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib) (2.4.7)\n", | |
"Requirement already satisfied: numpy>=1.15 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib) (1.19.2)\n", | |
"Requirement already satisfied: six>=1.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from python-dateutil>=2.1->matplotlib) (1.15.0)\n", | |
"Collecting math\n", | |
"\u001b[31m ERROR: Could not find a version that satisfies the requirement math (from versions: none)\u001b[0m\n", | |
"\u001b[31mERROR: No matching distribution found for math\u001b[0m\n", | |
"Requirement already satisfied: scipy in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (1.5.3)\n", | |
"Requirement already satisfied: numpy>=1.14.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from scipy) (1.19.2)\n" | |
] | |
} | |
], | |
"source": [ | |
"!pip install pandas\n", | |
"!pip install numpy\n", | |
"!pip install matplotlib\n", | |
"!pip install math\n", | |
"!pip install scipy" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Import the libraries we need for the lab\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"import pandas as pd\n", | |
"import matplotlib.pyplot as plt\n", | |
"import scipy.stats\n", | |
"from math import sqrt" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Read in the csv file from the url using the request library\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"ratings_url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ST0151EN-SkillsNetwork/labs/teachingratings.csv'\n", | |
"ratings_df = pd.read_csv(ratings_url)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Introduction to Probability Distribution\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In this section, you will learn how to create the plot distributions using the scipy library in python\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Normal Distribution\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"A normal distribution is a bell-shaped density curve described by its mean μ and standard deviation σ. The curve is symmetrical and centered around it's mean. A normal distribution curve looks like this:\n" | |
] | |
}, | |
{ | |
"attachments": { | |
"image.png": { | |
"image/png": "" | |
} | |
}, | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can visualize the curve. Import norm from scipy.stat and plot graph with matplotlib\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"from scipy.stats import norm\n", | |
"\n", | |
"# Plot between -4 and 4 with 0.1 steps.\n", | |
"x_axis = np.arange(-4, 4, 0.1)\n", | |
"# Mean = 0, SD = 1.\n", | |
"plt.plot(x_axis, norm.pdf(x_axis, 0, 1))\n", | |
"plt.show()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Lab Exercises\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Using the teachers' rating dataset, what is the probability of receiving an evaluation score of greater than 4.5\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Find the mean and standard deviation of teachers' evaluation scores\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"3.998 0.555\n" | |
] | |
} | |
], | |
"source": [ | |
"eval_mean = round(ratings_df['eval'].mean(), 3)\n", | |
"eval_sd = round(ratings_df['eval'].std(), 3)\n", | |
"print(eval_mean, eval_sd)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Use the scipy.stats module. Because python only looks to the left i.e. less than, we do remove the probability from 1 to get the other side of the tail\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0.1828639734596742\n" | |
] | |
} | |
], | |
"source": [ | |
"prob0 = scipy.stats.norm.cdf((4.5 - eval_mean)/eval_sd)\n", | |
"print(1 - prob0)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Using the teachers' rating dataset, what is the probability of receiving an evaluation score greater than 3.5 and less than 4.2\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"First we find the probability of getting evaluation scores less than 3.5 using the <code>norm.cdf</code> function\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0.1847801491443654\n" | |
] | |
} | |
], | |
"source": [ | |
"x1 = 3.5\n", | |
"prob1 = scipy.stats.norm.cdf((x1 - eval_mean)/eval_sd)\n", | |
"print(prob1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Then for less than 4.2\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0.642057540461896\n" | |
] | |
} | |
], | |
"source": [ | |
"x2 = 4.2\n", | |
"prob2 = scipy.stats.norm.cdf((x2 - eval_mean)/eval_sd)\n", | |
"print(prob2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The probability of a teacher receiving an evaluation score that is between 3.5 and 4.2 is:\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"45.7" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"round((prob2 - prob1)*100, 1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Using the two-tailed test from a normal distribution:\n", | |
"\n", | |
"- A professional basketball team wants to compare its performance with that of players in a regional league.\n", | |
"- The pros are known to have a historic mean of 12 points per game with a standard deviation of 5.5. \n", | |
"- A group of 36 regional players recorded on average 10.7 points per game.\n", | |
"- The pro coach would like to know whether his professional team scores on average are different from that of the regional players.\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"State the null hypothesis\n", | |
"\n", | |
"- $H_0$: $x = µ_1$ (\"The mean point of the regional players is not different from the historic mean\")\n", | |
"- $H_1$: $x ≠ µ_1$ (\"The mean point of the regional players is different from the historic mean\")\n" | |
] | |
}, | |
{ | |
"attachments": { | |
"image.png": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAANYAAAB4CAYAAACO9QywAAAPn0lEQVR4Ae2dd8wUxRvHUSxYsWGJGgVBrKiI3aACRkTRILYIURRiiSLxDyEoMRoraoJSjAgGCyiWxEaCxhJ7FwREVMAKdqUpiG3MZ/Kbzb13u8ft3c7u7f6+T7J539uZnfKd+e7sPDPPM62MRAgIgcQRaJV4ikpQCAgBI2KpEwgBDwiIWB5AVZJCQMRSHxACHhAQsTyAqiSFgIilPiAEPCAgYnkAVUkKARFLfUAIeEBAxPIAqpIUAiKW+oAQ8ICAiOUBVCUpBEQs9QEh4AEBEcsDqEpSCIhY6gNCwAMCIpYHUJWkEBCx1AeEgAcERCwPoCpJISBiqQ8IAQ8IiFgeQFWSQkDEUh8QAh4QELE8gKokhYCIpT4gBDwgIGJ5AFVJCgERS31ACHhAQMTyAKqSFAIilvqAEPCAgIjlAVQlKQRELPUBIeABARHLA6hKUgiIWOoDQsADAiKWB1CVpBAQsdQHhIAHBEQsD6AqSSEgYqkPCAEPCIhYCYPaqlUrk+aVcPGVXEIIiFgJAemSSZNU5CVpTgTUMs3ZLipVzhEQsXLegCp+cyIgYqXULkuXLjXvv/+++eijj8yiRYvM559/bj755BMzd+5ce3/ZsmUtSvL111+bDz74wMZfuHChfYZnubdy5coWcfWj+RAQsVJqk/79+1dVaowfP75FSY455pjI+A888ECLuPrRfAiIWCm1yT///GMYlR5//HHTunXrgDQDBgywI1d5MX7//Xdz3nnntYj36KOPGkY+SfMjIGJl0EZnnXVWQJhLL700tAR//fWX6dmzp9l2223Niy++GBpHN5sXARErg7Z5/fXXA2JtscUWFXOmf//91wwcONBss802Zs6cORmUUFk2ioCI1SiCdT5/4IEHBuQaN25ci1SGDh1qNt98c/P222+3uK8f+UFAxMqorSZPnhwQa6+99gpKcc0115iNNtrIvPDCC8E9/ZM/BESsjNps9erV9lPP7dR4/vnnzZgxY6xiAwVHUWTx4sVWaVOU+tRaDxGrVqQ8xLvyyiuDUWvHHXc066+/vmEkK4rwslhvvfXMiSeeGFol1vL2339/c+2114aG5/mmiJVh69GxIJMbtfgMLJL069fP1u3kk08OrdaQIUNsONrPoomIlXGLHn/88QGxRowYkXFpkssezSZLBbw0br311tCEO3ToYMNHjhwZGp7nmyJWhq333nvvmU022SQgVrt27czatWszLFFyWbNVy43E7777bkXCX375ZRD+7LPPVoTn/YaIlVELfvzxx/aNzmLx7rvvHnSyqVOnJlqiBQsWmFmzZjV0/fjjj7HLNHbsWFsn1un+/vvviufvu+8+G77BBhuY3377rSI87zdErAxakLf1LrvsYnr37m3+/PNPc/vttwfEOuKIIxIrEYvLbtRo5O9hhx0Wu0ynnXaazTtKcXH++efb8MMPPzx22nl4QMRKuZW+//5706lTJwOB2A+I/Prrr2bTTTcNSMAO9qTkueeeM08++WRDF6NeHGF+td1229n6jB49OvRRN7+66qqrQsPzflPESrEFly9fbthxwVVuJuI0ZIwsgwYNSrFUyWc1b9684CURtnsEkxg3ghZ1IVzESr5fhabIgvDRRx9tOnfubH744YeKOB9++GHQ2dq0aWN+/vnnijh5ucEWLYjDGhb1LhfMXghnh8maNWts8GWXXWbeeeed8qi5/S1ipdB0GCaiVt9tt93MN998E5kjxHNv8ltuuSUyXrMHnH766bYeKGXCxO3ud/MrRm9IFveTMyztZrknYnluiS+++MIcfPDBtqNhAVxNHn744YBYkBDTkbwJ8yuWDXhBtG3btqIOqNbdovjFF19sqzdlyhQ778xbXauVV8Sqhk6dYajS77rrLnPOOee0WKfCSjjMrJ61KxQMffv2DYhFxxw8eLCZOXOmNd+vsyipPzZ//vwWdShdPnjooYcs2Q455BAbZ9iwYVYr2rVrV3PdddelXlafGaZOLPw9sDAa58JXBGrpvMi+++5rNtxwQ8MaDjZV/OVThzkHWrpywZARIvEM5iLsWNh6662tphBr4/3226/8kab9PWHCBFsX6tulSxf7P1rQnXbayV68QJg/Qibqtuuuu5pu3boFc62mrVjMgqVKLCbtdB43j4jzF8crkuZH4IwzzrDte9JJJ9nCsnTw1FNPGXZf/PHHH0EFWDR++eWXzRtvvGH4fCyapEqse++9NyAV3+E33XST4Zsb0qxatcq+tdAS8UYrNQQcNWpU0XAvbH22335728ZR+wMLW/GyiqVKrFNPPdWCfsEFF1gilZXF/mTCzm5oN5oRV5IPBJhbunYrkuq8HvRTIxbrGewuOPbYYys0RaUFZ8LuGqdPnz5V45Y+p/+zRwCFDW0XtT8w+xKmV4LUiPX0009bNSvq5yjBHsmR6tBDDy3k5syouhfhvlufcvOrItSp3jqkRqwLL7zQfuJFFfTuu+8OSIUWqZ4d1VFp6346CJxwwgm2De+55550MmziXFIjFpNZVOxhggrWObHcYYcdDH4SJPlDAGeiUW2cv9o0VuLUiBVVTNStztiPNZwkd3ZH5an7QsA3ApkSi71hLKAyr2J9K2zxtB4AWCNp9MIltEQI1ItAZsTis4H9cJCKHQkPPvhgvXUInkNVj7cjpwBp9G9RbYUCwPSPNwQyIdaKFSvMAQccEBAgyhiunlpj8n399dc3fN1www1m9uzZ9RRBzwgBkzqx2HDao0ePgFSXX355aDOwWTXMV0Jo5Ca62egoqedT75Jeek+qtWBP2Nlnnx2Qin1lUXMZ3C6zsztvImI0frh53to8rLypEuuKK64ISMXBaqWbMksL99prr1llRrn5emmcZv1fxBKx6JupEavUExFmENVIc+655xoWG+sR3Brff//9DV+Yj+NNSSIE6kEgFWJh4Ibmj7c59jdLliyJLCsbOfH5UM/qPVpBrFaTGjUwxJMIgXoQ8E4svPBg9EZnx3gPC9MoYaMuTvIx3Q5zuBL1XOn9X375xR6cjV/0Ri72NBbFK20pPvo/HQS8Egt19ZZbbmlJxSjE3ClK3nzzTYNzEQjYvXv3qGi6vw4EUBBV+yJYx+OxgjG7T+rrIFbGOYjsjVjMTzDHdsDj/4Fzn9w1ffp0c9tttxnO4C1VvxP/jjvuyAF0zVlEvDvhthnDUZ/Cl4hr2yT++ixrFml7I5bzLRcXdPYLpvXGzQJw33nuvffepmPHjr6zsXZ1tBWf3pJKBLwRqzIr3fGNAE53eJH5PsjNHU7OwXmScARErHBccnkXLSbE8m12w0EHWCTgh14SjoCIFY5L7u6y1IAjl6OOOspr2d2oiEtoSTQCIlY0NrkKmTFjhh2tsMT2KRzPw/IJBxtIohEQsaKxyVUI/iY23nhjeySQr4LjIpuFfk5GkVRHQMSqjk8uQjHDYc7Tv39/r+VlyQQXCngzllRHQMSqjk8uQp0jVHyH+JKFCxdaUg0YMCAyCwgXtb/yq6++MmxXCzvWJzLBHAeIWBk0HhuFBw4caP2XY0UddXHqYy0Lvfhq5ARFn/7t8ffIZ2DYljTs5jj21Z1K2a9fP3v4A2cLX3311WarrbayezjZ0sZOHE4XKbqIWCm2MGYyEKrWRXM2LK/r4GtGCDo8O1h8CaMNPklQXITJnXfeafbYYw/DJ6k7mZK1NE4V4QgjPDex1YqXBHWi/m+99VZYUoW5J2Kl1JT4STzyyCNtp+Jgb9wHMHJNmzbNYNRJZ0Nd/umnn9qLTy93RnG1It5444322bAjSas9FycM1Trli/KgxakimAUhLBq7FwfrXeWj6JlnnmnDcc5aZBGxUmpd58ySE+jLbdE45ZH9fXRIdjXEEUi55557xnkkVtzvvvvOmvFAkjBhXsWI6dTvbt8nLw/OXC4XPhmppzt0rjy8KL9FrBRa8oknnrCdCVuxb7/9NjRHd5ZUHDs0PrHopIx+tQqfloyKtYobgfD/GCac7cUhFghmNs5HJHtFw8R50WIDdpFFxPLcuvj04CxeCHDJJZdE5uYsASZNmhQZpzwARzyMFtX84Zc+M2fOHDu64ctxXXM3nuM4JTbaHnfccaXJRP7/6quv2npS188++6wiXukB5ppjVcCjG3EQYFGVjsbF4WthwieTizNr1qywKBX32MLEGWO12q5NnjzZktDlM3bs2Io0y2+4QyowEalFOO6U9HfeeefQ6IyshEPsPHrgCq1UxE2NWBHAJHUb1TKdqX379pFJcuIhcfiMgjC1yDPPPGOfqWWE4/hS5jzk41TilKda50bDh5oclX+twshGPdB8hgm+TghHdV90EbE8t/Dw4cNtZ2JtJ0p69uxp48Q5/gbtGlbZYQqC8nx++umn4FBxp+Gjgz/yyCPlUYPfN998sy0TexBrEZYSKA/phpG99DMQbSiC1bjPZYJayu0rjojlC9n/pevM16Pe4nPnzrWdkbkMPjpqEchEJ8YvY1zBpMSd7MI6U5ig5uczk+Nqa5VXXnnF1gNihc2vRowYYcNRXjhfkqeccoq56KKLas0iV/FELM/NhRaQznbQQQdV5ISKGmtfwjkNsVZhvsQzfA7WI+4AbtKAEOWCawTCHnvssfKgyN8sCPMMSpgwcaOye8FwIAaKl1rnlGFpNvM9ESuF1uEcZTodi6jOSSkdinkOIw9zoDiCwoIRpdb5WHnanA9Mebj69u3bIpjyoXzAxN+NLC0iRPzo1auXTQ9Px2HidmSMGTPGHuJOfPxHFlVErBRalu08I0eOtG9oFAJs64FQvMXnzZsXqwSo1nnTDx06NNZz5ZEhJ8QiLUYPJxMnTrT3cVgaR7p27Wqfixrl0I6yn5ERDUUKa19F3pArYsXpPQ3GZW700ksv2d0V9fos5BQUCNHoyYmcCe1GLWdfxQjIKNqhQ4fYoyFbsFjHqibsFURxEbUDvtqzeQsTsXLWYp07d7afaY0Wm1HU7VHEQBL/FbjmhmyMWpLGEBCxGsMv1adZYKbjs/E2CUEt7kYtDtmDaHym1TuaJlGmoqQhYuWoJZlXMSfCjCMJQVHBYeqOXPzFBETSOAIiVuMYppIC5hdM/jFqTFLcnA1SYbZSZIVCkritKy0Ra10INUm4UzZghp+k4Ml2s802s6MW7qklySAgYiWDo/dUUNdDAPbwJS3Mr/bZZ59g21PS6f8/pidi5aTVMeGIu+aVk6oVspgiViGbVZXKGgERK+sWUP6FREDEKmSzqlJZIyBiZd0Cyr+QCIhYhWxWVSprBESsrFtA+RcSARGrkM2qSmWNgIiVdQso/0IiIGIVsllVqawRELGybgHlX0gERKxCNqsqlTUCIlbWLaD8C4nAf7Sxhb28Il0QAAAAAElFTkSuQmCC" | |
} | |
}, | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"When the population standard deviation is given and we are asked to deal with a sub-sample, the size (n) of the sub-sample is used in the formula:\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.156" | |
] | |
}, | |
"execution_count": 12, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"## because it is a two-tailed test we multiply by 2\n", | |
"2*round(scipy.stats.norm.cdf((10.7 - 12)/(5.5/sqrt(36))), 3)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Conclusion:** Because the p-value is greater than 0.05, we fail to reject the null hypothesis as there is no sufficient evidence to prove that the mean point of the regional players is different from the historic mean\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Practice Questions\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Question 1: Using the teachers' rating dataset, what is the probability of receiving an evaluation score greater than 3.3?\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"## insert code here\n", | |
"x1 = 3.3\n", | |
"prob1 = scipy.stats.norm.cdf((x1 - eval_mean)/eval_sd)\n", | |
"print(1-prob1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Double-click **here** for the solution.\n", | |
"\n", | |
"<!-- The answer is below:\n", | |
"##calculate the probability less than 3.3\n", | |
"prob_less_than = scipy.stats.norm.cdf((3.3 - eval_mean)/eval_sd)\n", | |
"##then remove the probability from 1 to get the area to the right of 3.3\n", | |
"print(1 - prob_less_than)\n", | |
"-->\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Question 2: Using the teachers' rating dataset, what is the probability of receiving an evaluation score between 2 and 3?\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0.00015910859015753364\n", | |
"0.10425779582058459\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"10.4" | |
] | |
}, | |
"execution_count": 17, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"## insert code here\n", | |
"x2 = 2\n", | |
"prob2 = scipy.stats.norm.cdf((x2 - eval_mean)/eval_sd)\n", | |
"print(prob2)\n", | |
"x1 = 3.3\n", | |
"prob1 = scipy.stats.norm.cdf((x1 - eval_mean)/eval_sd)\n", | |
"print(prob1)\n", | |
"round((prob1-prob2)*100,1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Double-click **here** for the solution.\n", | |
"\n", | |
"<!-- The answer is below:\n", | |
"## find the probablity of reciving a score of less than 2\n", | |
"prob_less_than_2 = scipy.stats.norm.cdf((x1 - eval_mean)/eval_sd)\n", | |
"print(prob_less_than_2)\n", | |
"\n", | |
"## find the probablity of reciving a score of less than 3\n", | |
"prob_less_than_3 = scipy.stats.norm.cdf((x2 - eval_mean)/eval_sd)\n", | |
"print(prob_less_than_3)\n", | |
"\n", | |
"## remove both probabilities from each other\n", | |
"round((prob_less_than_3 - prob_less_than_2)*100, 1)\n", | |
"-->\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Question 3: To test the hypothesis that sleeping for at least 8 hours makes one smarter, 12 people who have slept for at least 8 hours every day for the past one year have their IQ tested.\n", | |
"\n", | |
"- Here are the results: 116, 111, 101, 120, 99, 94, 106, 115, 107, 101, 110, 92\n", | |
"- Test using the following hypotheses: H0: μ = 100 or Ha: μ > 100 \n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"ename": "NameError", | |
"evalue": "name 'mean_IQ' is not defined", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", | |
"\u001b[0;32m<ipython-input-18-36a307f7f9f1>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m## insert code here\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mround\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0mscipy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstats\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnorm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcdf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmean_IQ\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0;36m100\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m/\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstd_devIQ\u001b[0m\u001b[0;34m/\u001b[0m\u001b[0msqrt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m12\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mmean_IQ\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmean\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m116\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m111\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m101\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m120\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m99\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m94\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m106\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m115\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m107\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m101\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m110\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m92\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mstd_devIQ\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m116\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m111\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m101\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m120\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m99\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m94\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m106\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m115\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m107\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m101\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m110\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m92\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;31mNameError\u001b[0m: name 'mean_IQ' is not defined" | |
] | |
} | |
], | |
"source": [ | |
"## insert code here\n", | |
"round(1-scipy.stats.norm.cdf((mean_IQ - 100)/(std_devIQ/sqrt(12))), 3)\n", | |
"mean_IQ=np.mean([116, 111, 101, 120, 99, 94, 106, 115, 107, 101, 110, 92])\n", | |
"std_devIQ=np.std([116, 111, 101, 120, 99, 94, 106, 115, 107, 101, 110, 92])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Double-click **here** for a hint.\n", | |
"\n", | |
"<!-- The hint is below:\n", | |
"### find the mean and standard deviation of the 12 IQs\n", | |
"iq_mean = np.mean([116, 111, 101, 120, 99, 94, 106, 115, 107, 101, 110, 92])\n", | |
"iq_std = np.std([116, 111, 101, 120, 99, 94, 106, 115, 107, 101, 110, 92])\n", | |
"-->\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Double-click **here** for the solution.\n", | |
"\n", | |
"<!-- The answer is below:\n", | |
"### remember to remove from 1 because we want the value for when IQs are greater than 100\n", | |
"round(1-scipy.stats.norm.cdf((iq_mean - 100)/(iq_std/sqrt(12))), 3)\n", | |
"-->\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Authors\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"[Aije Egwaikhide](https://www.linkedin.com/in/aije-egwaikhide?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-ST0151EN-SkillsNetwork-20531532&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) is a Data Scientist at IBM who holds a degree in Economics and Statistics from the University of Manitoba and a Post-grad in Business Analytics from St. Lawrence College, Kingston. She is a current employee of IBM where she started as a Junior Data Scientist at the Global Business Services (GBS) in 2018. Her main role was making meaning out of data for their Oil and Gas clients through basic statistics and advanced Machine Learning algorithms. The highlight of her time in GBS was creating a customized end-to-end Machine learning and Statistics solution on optimizing operations in the Oil and Gas wells. She moved to the Cognitive Systems Group as a Senior Data Scientist where she will be providing the team with actionable insights using Data Science techniques and further improve processes through building machine learning solutions. She recently joined the IBM Developer Skills Network group where she brings her real-world experience to the courses she creates.\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Change Log\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n", | |
"| ----------------- | ------- | --------------- | -------------------------------------- |\n", | |
"| 2020-08-14 | 0.1 | Aije Egwaikhide | Created the initial version of the lab |\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" Copyright © 2020 IBM Corporation. This notebook and its source code are released under the terms of the [MIT License](https://cognitiveclass.ai/mit-license?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-ST0151EN-SkillsNetwork-20531532&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-ST0151EN-SkillsNetwork-20531532&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-ST0151EN-SkillsNetwork-20531532&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-ST0151EN-SkillsNetwork-20531532&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ).\n" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python", | |
"language": "python", | |
"name": "conda-env-python-py" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.11" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment