josh-hernandez-exe · November 14, 2020 05:39
diff --git a/f-distro-share.ipynb b/f-distro-share.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I'm making this post as an interesting exploration of Theorem 9.3.1 and the F distribution.\n",
    "\n",
    "This came about from a question Casey asked in class. Basically can we rephrase the theorem so that we have $s_{X}^2/s_{Y}^2$ rather than $s_{X}^2/s_{Y}^2$. Dr. Leboeuf made an interesting comment about the data labels being arbitrary but the inference models being specific. So I have given it some thought. \n",
    "\n",
    "For the TLDR:\n",
    "\n",
    "$$\n",
    "F_{\\alpha, n, m} = \\frac{1}{F_{1-\\alpha, m, n}}\n",
    "$$\n",
    "\n",
    "Note the:\n",
    "\n",
    "- $\\alpha$ on the LHS (left hand side) and a $(1-\\alpha)$ on the RHS (right hand side)\n",
    "- the swapping of $n$, and $m$\n",
    "\n",
    "With that you can rephrase Theorem 9.3.1\n",
    "\n",
    "\\begin{align*}\n",
    "H_{a}&: \\sigma^{2}_{X} \\gt \\sigma^{2}_{X} \\Rightarrow s_{X}^2/s_{Y}^2 \\ge F_{1-\\alpha, n-1, m-1} \n",
    "\\\\\n",
    "H_{b}&: \\sigma^{2}_{X} \\lt \\sigma^{2}_{X} \\Rightarrow s_{X}^2/s_{Y}^2 \\le F_{\\alpha, n-1, m-1} \n",
    "\\\\\n",
    "H_{c}&: \\sigma^{2}_{X} \\ne \\sigma^{2}_{X} \\Rightarrow (1) \\ \\ s_{X}^2/s_{Y}^2 \\le F_{\\frac{\\alpha}{2}, n-1, m-1}\n",
    "\\\\\n",
    "&\\phantom{: \\sigma^{2}_{X} \\ne \\sigma^{2}_{X} \\Rightarrow }\n",
    "\\text{or}\n",
    "\\\\\n",
    "&\\phantom{: \\sigma^{2}_{X} \\ne \\sigma^{2}_{X} \\Rightarrow }\n",
    "(2) \\ \\ \\ s_{X}^2/s_{Y}^2 \\ge F_{1-\\frac{\\alpha}{2}, n-1, m-1}\n",
    "\\end{align*}\n",
    "\n",
    "And I think this looks way nicer. The inequality directions all match.\n",
    "\n",
    "You can do this by applying Theorem 9.3.1 to some data, swap the labels of data, then apply Theorem 9.3.1 again. You then will have two expression for rejecting the data that must be the same.\n",
    "\n",
    "It's a bit tedious to do for all cases, but the algebra isn't so bad.\n",
    "\n",
    "I'll share a code block where I verify this numerically at the end."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The numerical results were almost perfect. When there was a difference, it began at 12 decimal places in on average.\n",
    "\n",
    "Two interesting consequences of that are\n",
    "$$\n",
    "\\text{cdf}_{F_{n,m}}(x) = 1 - \\text{cdf}_{F_{m,n}}\\left(\\frac{1}{x}\\right)\n",
    "$$\n",
    "$$\n",
    "\\text{pdf}_{F_{n,m}}(x) = \\text{pdf}_{F_{m,n}}\\left(\\frac{1}{x}\\right) \\cdot \\frac{1}{x^2}\n",
    "$$\n",
    "Note the swapping of $n$ and $m$ from the LHS and the RHS.\n",
    "\n",
    "Remember that $F_{\\alpha,n,m}$ is a percentile, which is a statement of what is the input value to the cdf to give an output of $\\alpha$ (the inverse function of the cdf). Which is how the cdf equation was postulated. Taking the derivative of the cdf gives the pdf.\n",
    "\n",
    "The pdf is the easiest to verify algebraically since we can look up the form of the pdf of $F_{n,m}$. Tedious, but not hard.\n",
    "\n",
    "I've also verified numerically the two above statements the same way as the percentile equation with the same results.\n",
    "\n",
    "A follow up discussion is how I how think about the two sample test of variance in Theorem 9.3.1. Think about rewriting the hypothesis and the alternative statements as ratios.\n",
    "\n",
    "\\begin{align*}\n",
    "H_{0}&: \\frac{\\sigma^{2}_{X}}{\\sigma^{2}_{Y}} = 1\n",
    "\\\\\n",
    "H_{a}&: \\frac{\\sigma^{2}_{X}}{\\sigma^{2}_{Y}} \\gt 1 \\Rightarrow \\frac{s_{X}^2}{s_{Y}^2} \\ge F_{1-\\alpha, n-1, m-1} \n",
    "\\\\\n",
    "H_{b}&: \\frac{\\sigma^{2}_{X}}{\\sigma^{2}_{Y}} \\lt 1 \\Rightarrow \\frac{s_{X}^2}{s_{Y}^2} \\le F_{\\alpha, n-1, m-1} \n",
    "\\\\\n",
    "H_{c}&: \\frac{\\sigma^{2}_{X}}{\\sigma^{2}_{Y}} \\ne 1 \\Rightarrow (1) \\ \\ \\frac{s_{X}^2}{s_{Y}^2} \\le F_{\\frac{\\alpha}{2}, n-1, m-1}\n",
    "\\\\\n",
    "&\\phantom{: \\sigma^{2}_{X} \\ne \\sigma^{2}_{X} \\Rightarrow }\n",
    "\\text{or}\n",
    "\\\\\n",
    "&\\phantom{: \\sigma^{2}_{X} \\ne \\sigma^{2}_{X} \\Rightarrow }\n",
    "(2) \\ \\ \\ \\frac{s_{X}^2}{s_{Y}^2} \\ge F_{1-\\frac{\\alpha}{2}, n-1, m-1}\n",
    "\\end{align*}\n",
    "\n",
    "Can you see how the $\\frac{\\sigma^{2}_{X}}{\\sigma^{2}_{Y}}$ is being tested with $\\frac{s_{X}^2}{s_{Y}^2}$, how the inequalities match up to a random variable $\\frac{\\chi_{n-1}^2}{\\chi_{m-1}^2}=F_{n-1,m-1}$\n",
    "\n",
    "Furthermore, you can see a \"f-score\" $f=\\frac{s_{X}^2}{s_{Y}^2}$ being compared to critical values in the F-distribution like all our other sample tests.\n",
    "\n",
    "If you look at Theorem 9.3.1 as it is stated with the through the lens of $\\frac{\\sigma^{2}_{Y}}{\\sigma^{2}_{X}}$ (this is inverted from what we were just looking at), then all the inequalities make a lot of sense.\n",
    "\n",
    "I thought this was all fascinating, and was asked to share this. I hope you found it as interesting as I did. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Percent Error  :  0.02%\n",
      "Percent Correct: 99.98%\n",
      "Error Stats:\n",
      "      Min: 1.3322676295501878e-15\n",
      "      Mean: 1.1796952303910757e-12\n",
      "      Max: 1.4551915228366852e-11\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import scipy as sp\n",
    "import scipy.stats\n",
    "\n",
    "rng = np.random.default_rng()\n",
    "\n",
    "num_sims = 100_000\n",
    "epsilon = 1e-15 \n",
    "int_low = 1\n",
    "int_high = 1_000\n",
    "\n",
    "error_count = 0\n",
    "error_values = []\n",
    "\n",
    "for _ in range(num_sims):\n",
    "    alpha = rng.random()\n",
    "    dn, dm = rng.integers(int_low, int_high, size=2)\n",
    "    lhs = sp.stats.f.ppf(alpha, dn, dm)\n",
    "    inv_rhs = sp.stats.f.ppf(1-alpha, dm, dn)\n",
    "    rhs = 1 / inv_rhs\n",
    "    \n",
    "    abs_error = abs(lhs-rhs)\n",
    "\n",
    "    if abs_error > epsilon:\n",
    "        error_count += 1\n",
    "        error_values.append(abs_error)\n",
    "\n",
    "correct_count = num_sims-error_count\n",
    "\n",
    "print(f\"Percent Error  : {100*(error_count/num_sims):5.2f}%\")\n",
    "print(f\"Percent Correct: {100*(correct_count/num_sims):5.2f}%\")\n",
    "print( \"Error Stats:\")\n",
    "print(f\"      Min: {min(error_values)}\")\n",
    "print(f\"      Mean: {np.mean(error_values)}\")\n",
    "print(f\"      Max: {max(error_values)}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"I'm making this post as an interesting exploration of Theorem 9.3.1 and the F distribution.\n",
	"\n",
	"This came about from a question Casey asked in class. Basically can we rephrase the theorem so that we have $s_{X}^2/s_{Y}^2$ rather than $s_{X}^2/s_{Y}^2$. Dr. Leboeuf made an interesting comment about the data labels being arbitrary but the inference models being specific. So I have given it some thought. \n",
	"\n",
	"For the TLDR:\n",
	"\n",
	"$$\n",
	"F_{\\alpha, n, m} = \\frac{1}{F_{1-\\alpha, m, n}}\n",
	"$$\n",
	"\n",
	"Note the:\n",
	"\n",
	"- $\\alpha$ on the LHS (left hand side) and a $(1-\\alpha)$ on the RHS (right hand side)\n",
	"- the swapping of $n$, and $m$\n",
	"\n",
	"With that you can rephrase Theorem 9.3.1\n",
	"\n",
	"\\begin{align*}\n",
	"H_{a}&: \\sigma^{2}_{X} \\gt \\sigma^{2}_{X} \\Rightarrow s_{X}^2/s_{Y}^2 \\ge F_{1-\\alpha, n-1, m-1} \n",
	"\\\\\n",
	"H_{b}&: \\sigma^{2}_{X} \\lt \\sigma^{2}_{X} \\Rightarrow s_{X}^2/s_{Y}^2 \\le F_{\\alpha, n-1, m-1} \n",
	"\\\\\n",
	"H_{c}&: \\sigma^{2}_{X} \\ne \\sigma^{2}_{X} \\Rightarrow (1) \\ \\ s_{X}^2/s_{Y}^2 \\le F_{\\frac{\\alpha}{2}, n-1, m-1}\n",
	"\\\\\n",
	"&\\phantom{: \\sigma^{2}_{X} \\ne \\sigma^{2}_{X} \\Rightarrow }\n",
	"\\text{or}\n",
	"\\\\\n",
	"&\\phantom{: \\sigma^{2}_{X} \\ne \\sigma^{2}_{X} \\Rightarrow }\n",
	"(2) \\ \\ \\ s_{X}^2/s_{Y}^2 \\ge F_{1-\\frac{\\alpha}{2}, n-1, m-1}\n",
	"\\end{align*}\n",
	"\n",
	"And I think this looks way nicer. The inequality directions all match.\n",
	"\n",
	"You can do this by applying Theorem 9.3.1 to some data, swap the labels of data, then apply Theorem 9.3.1 again. You then will have two expression for rejecting the data that must be the same.\n",
	"\n",
	"It's a bit tedious to do for all cases, but the algebra isn't so bad.\n",
	"\n",
	"I'll share a code block where I verify this numerically at the end."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The numerical results were almost perfect. When there was a difference, it began at 12 decimal places in on average.\n",
	"\n",
	"Two interesting consequences of that are\n",
	"$$\n",
	"\\text{cdf}_{F_{n,m}}(x) = 1 - \\text{cdf}_{F_{m,n}}\\left(\\frac{1}{x}\\right)\n",
	"$$\n",
	"$$\n",
	"\\text{pdf}_{F_{n,m}}(x) = \\text{pdf}_{F_{m,n}}\\left(\\frac{1}{x}\\right) \\cdot \\frac{1}{x^2}\n",
	"$$\n",
	"Note the swapping of $n$ and $m$ from the LHS and the RHS.\n",
	"\n",
	"Remember that $F_{\\alpha,n,m}$ is a percentile, which is a statement of what is the input value to the cdf to give an output of $\\alpha$ (the inverse function of the cdf). Which is how the cdf equation was postulated. Taking the derivative of the cdf gives the pdf.\n",
	"\n",
	"The pdf is the easiest to verify algebraically since we can look up the form of the pdf of $F_{n,m}$. Tedious, but not hard.\n",
	"\n",
	"I've also verified numerically the two above statements the same way as the percentile equation with the same results.\n",
	"\n",
	"A follow up discussion is how I how think about the two sample test of variance in Theorem 9.3.1. Think about rewriting the hypothesis and the alternative statements as ratios.\n",
	"\n",
	"\\begin{align*}\n",
	"H_{0}&: \\frac{\\sigma^{2}_{X}}{\\sigma^{2}_{Y}} = 1\n",
	"\\\\\n",
	"H_{a}&: \\frac{\\sigma^{2}_{X}}{\\sigma^{2}_{Y}} \\gt 1 \\Rightarrow \\frac{s_{X}^2}{s_{Y}^2} \\ge F_{1-\\alpha, n-1, m-1} \n",
	"\\\\\n",
	"H_{b}&: \\frac{\\sigma^{2}_{X}}{\\sigma^{2}_{Y}} \\lt 1 \\Rightarrow \\frac{s_{X}^2}{s_{Y}^2} \\le F_{\\alpha, n-1, m-1} \n",
	"\\\\\n",
	"H_{c}&: \\frac{\\sigma^{2}_{X}}{\\sigma^{2}_{Y}} \\ne 1 \\Rightarrow (1) \\ \\ \\frac{s_{X}^2}{s_{Y}^2} \\le F_{\\frac{\\alpha}{2}, n-1, m-1}\n",
	"\\\\\n",
	"&\\phantom{: \\sigma^{2}_{X} \\ne \\sigma^{2}_{X} \\Rightarrow }\n",
	"\\text{or}\n",
	"\\\\\n",
	"&\\phantom{: \\sigma^{2}_{X} \\ne \\sigma^{2}_{X} \\Rightarrow }\n",
	"(2) \\ \\ \\ \\frac{s_{X}^2}{s_{Y}^2} \\ge F_{1-\\frac{\\alpha}{2}, n-1, m-1}\n",
	"\\end{align*}\n",
	"\n",
	"Can you see how the $\\frac{\\sigma^{2}_{X}}{\\sigma^{2}_{Y}}$ is being tested with $\\frac{s_{X}^2}{s_{Y}^2}$, how the inequalities match up to a random variable $\\frac{\\chi_{n-1}^2}{\\chi_{m-1}^2}=F_{n-1,m-1}$\n",
	"\n",
	"Furthermore, you can see a \"f-score\" $f=\\frac{s_{X}^2}{s_{Y}^2}$ being compared to critical values in the F-distribution like all our other sample tests.\n",
	"\n",
	"If you look at Theorem 9.3.1 as it is stated with the through the lens of $\\frac{\\sigma^{2}_{Y}}{\\sigma^{2}_{X}}$ (this is inverted from what we were just looking at), then all the inequalities make a lot of sense.\n",
	"\n",
	"I thought this was all fascinating, and was asked to share this. I hope you found it as interesting as I did. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Percent Error : 0.02%\n",
	"Percent Correct: 99.98%\n",
	"Error Stats:\n",
	" Min: 1.3322676295501878e-15\n",
	" Mean: 1.1796952303910757e-12\n",
	" Max: 1.4551915228366852e-11\n"
	]
	}
	],
	"source": [
	"import numpy as np\n",
	"import scipy as sp\n",
	"import scipy.stats\n",
	"\n",
	"rng = np.random.default_rng()\n",
	"\n",
	"num_sims = 100_000\n",
	"epsilon = 1e-15 \n",
	"int_low = 1\n",
	"int_high = 1_000\n",
	"\n",
	"error_count = 0\n",
	"error_values = []\n",
	"\n",
	"for _ in range(num_sims):\n",
	" alpha = rng.random()\n",
	" dn, dm = rng.integers(int_low, int_high, size=2)\n",
	" lhs = sp.stats.f.ppf(alpha, dn, dm)\n",
	" inv_rhs = sp.stats.f.ppf(1-alpha, dm, dn)\n",
	" rhs = 1 / inv_rhs\n",
	" \n",
	" abs_error = abs(lhs-rhs)\n",
	"\n",
	" if abs_error > epsilon:\n",
	" error_count += 1\n",
	" error_values.append(abs_error)\n",
	"\n",
	"correct_count = num_sims-error_count\n",
	"\n",
	"print(f\"Percent Error : {100*(error_count/num_sims):5.2f}%\")\n",
	"print(f\"Percent Correct: {100*(correct_count/num_sims):5.2f}%\")\n",
	"print( \"Error Stats:\")\n",
	"print(f\" Min: {min(error_values)}\")\n",
	"print(f\" Mean: {np.mean(error_values)}\")\n",
	"print(f\" Max: {max(error_values)}\")"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.8.5"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}
No results found