Created
January 21, 2023 19:22
-
-
Save petermchale/cd0017e9d568796355337b4f4da865c5 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "313b67ae", | |
"metadata": {}, | |
"source": [ | |
"## Hypothesis\n", | |
"\n", | |
"We hypothesized that females tend to smile more than males. \n", | |
"\n", | |
"## Experimental Design \n", | |
"\n", | |
"To test the hypothesis, we spent an afternoon at the shopping mall, where we smiled at strangers we made eye contact with, and counted how many smiled back and how many didn't. " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "22fe4736", | |
"metadata": {}, | |
"source": [ | |
"## Construct a Contingency Table" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "05a3f478", | |
"metadata": {}, | |
"source": [ | |
"The data we collated is shown below" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"id": "f210e499", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>female</th>\n", | |
" <th>male</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>smile</th>\n", | |
" <td>28</td>\n", | |
" <td>15</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>no smile</th>\n", | |
" <td>25</td>\n", | |
" <td>31</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" female male\n", | |
"smile 28 15\n", | |
"no smile 25 31" | |
] | |
}, | |
"execution_count": 1, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import pandas as pd\n", | |
"\n", | |
"df = pd.DataFrame(\n", | |
" data = {\n", | |
" 'female': [28, 25],\n", | |
" 'male': [15, 31]\n", | |
" }, \n", | |
" index = pd.Index([\n", | |
" 'smile', \n", | |
" 'no smile']\n", | |
" ))\n", | |
"df" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "91a2dab2", | |
"metadata": {}, | |
"source": [ | |
"## Visualize the contingency table as a mosaic plot" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"id": "ff09fc7b", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<Figure size 640x480 with 3 Axes>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"from statsmodels.graphics.mosaicplot import mosaic\n", | |
"\n", | |
"d = {}\n", | |
"for k1, v1 in df.to_dict().items(): \n", | |
" for k2, v2 in v1.items(): \n", | |
" d[(k1, k2)] = v2\n", | |
"_ = mosaic(d) " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "45cfde08", | |
"metadata": {}, | |
"source": [ | |
"## Compute the size of the association between variables, and its significance" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"id": "3eeb140f", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"odds ratio: 2.3146666666666667\n", | |
"p-value: 0.06676269027846655\n" | |
] | |
} | |
], | |
"source": [ | |
"from scipy.stats import fisher_exact\n", | |
"\n", | |
"odds_ratio, pvalue = fisher_exact(table=df.to_numpy(), alternative='two-sided')\n", | |
"\n", | |
"print(f'odds ratio: {odds_ratio}')\n", | |
"print(f'p-value: {pvalue}')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "16e74904", | |
"metadata": {}, | |
"source": [ | |
"The odds of smiling if you're female is about 2.3 times larger than if you're male. \n", | |
"\n", | |
"The p-value indicates that the probability that we would observe these data, or an even more imbalanced data set, by chance is about 6.7%. A commonly used significance level is 5%–if we adopt that, our observed imbalance is on the border of being statistically significant. \n", | |
"\n", | |
"With that caveat, we may conclude that females tend to smile more than males." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "0b87d5dd", | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.9.12" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment