Created
May 31, 2017 01:42
-
-
Save jtrive84/10eeacbad852630cad90e071f5d15220 to your computer and use it in GitHub Desktop.
Derivation of the Normal Equations using Least Squares and Maximum Likelihood
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Derivation of the Normal Equations \n", | |
"\n", | |
"The Normal Equations, represented in matrix form as\n", | |
"\n", | |
"\n", | |
"$$\n", | |
"(X^{T}X)\\hat{\\beta} = X^{T}y\n", | |
"$$\n", | |
"\n", | |
"are utilized in determining coefficent values associated with multiple linear regression models. The matrix representation is a compact form of of the full model specification, which is commonly represented as\n", | |
"\n", | |
"$$\n", | |
"y = \\beta_{0} + \\beta_{1}x_{1} + \\beta_{2}x_{2} + \\cdots + \\beta_{k}x_{k} + \\varepsilon\n", | |
"$$\n", | |
"\n", | |
"where $\\varepsilon$ represents the error term, and \n", | |
"\n", | |
"$$\\sum_{i=1}^{n} \\varepsilon_{i} = 0.$$\n", | |
"\n", | |
"For a dataset with $n$ records by $k$ explanatory variables per record, the components of the Normal Equations are:\n", | |
"\n", | |
"* $\\hat{\\beta} = (\\hat{\\beta}_{0}, \\hat{\\beta}_{1},...,\\hat{\\beta}_{k})^{T}$, a vector of $(k+1)$ coefficents (one for each of the k explanatory variables plus one for the intercept term) \n", | |
"\n", | |
"* ${X}$, an $n$ by $(k+1)$-dimensional matrix of explanatory variables, with the first column consisting entirely of 1's \n", | |
"\n", | |
"* ${y} = (y_{1}, y_{2},...,y_{n})$, the response variable\n", | |
"\n", | |
"\n", | |
"The task is to solve for the $(k+1)$ $\\beta_{j}$'s such that $\\hat{\\beta}_{0}, \\hat{\\beta}_{1},...,\\hat{\\beta}_{k}$ minimize\n", | |
"\n", | |
"$$\n", | |
"\\sum_{i=1}^{n} \\hat{\\varepsilon}^{2}_{i} = \\sum_{i=1}^{n} (y_{i} - \\hat{\\beta}_{0} - \\hat{\\beta}_{1}x_{i1} - \\hat{\\beta}_{2}x_{i2} - \\cdots - \\hat{\\beta}_{k}x_{ik})^2.\n", | |
"$$\n", | |
"\n", | |
"\n", | |
"The Normal Equations can be derived using both Least-Squares and Maximum likelihood Estimation. We'll demonstrate both approaches.\n", | |
"\n", | |
"\n", | |
"### Least-Squares Derivation\n", | |
"\n", | |
"An advantage of the Least-Squares approach is that no distributional assumption is necessary (unlike Maximum Likelihood Estimation). For $\\hat{\\beta}_{0}, \\hat{\\beta}_{1},...,\\hat{\\beta}_{k}$, we seek estimators that minimize the sum of squared deviations between the $n$ response variables and the predicted values, $\\hat{y}$. The objective is to minimize\n", | |
"\n", | |
"\n", | |
"$$\n", | |
"\\sum_{i=1}^{n} \\hat{\\varepsilon}^{2}_{i} = \\sum_{i=1}^{n} (y_{i} - \\hat{\\beta}_{0} - \\hat{\\beta}_{1}x_{i1} - \\hat{\\beta}_{2}x_{i2} - \\cdots - \\hat{\\beta}_{k}x_{ik})^2.\n", | |
"$$\n", | |
"\n", | |
"\n", | |
"Using the more-compact matrix notation, our model can be represented as $y = X^{T}\\beta + \\varepsilon$. Isolating and squaring the error term yields\n", | |
"\n", | |
"$$\n", | |
"\\hat \\varepsilon^T \\hat \\varepsilon = \\sum_{i=1}^{n} (y - X\\hat{\\beta})^{T}(y - X\\hat{\\beta}).\n", | |
"$$\n", | |
"\n", | |
"Expanding the right-hand side and combining terms results in\n", | |
"\n", | |
"$$\n", | |
"\\hat \\varepsilon^T \\hat \\varepsilon = y^{T}y - 2y^{T}X\\hat{\\beta} + \\hat{\\beta}X^{T}X\\hat{\\beta}\n", | |
"$$\n", | |
"\n", | |
"\n", | |
"To find the value of $\\hat{\\beta}$ that minimizes $\\hat \\varepsilon^T \\hat \\varepsilon$, we differentiate $\\hat \\varepsilon^T \\hat \\varepsilon$ with respect to \n", | |
"$\\hat{\\beta}$, and set the result to zero:\n", | |
"\n", | |
"\n", | |
"$$\n", | |
"\\frac{\\partial \\hat{\\varepsilon}^{T}\\hat{\\varepsilon}}{\\partial \\hat{\\beta}} = -2X^{T}y + 2X^{T}X\\hat{\\beta} = 0\n", | |
"$$\n", | |
"\n", | |
"Which can then be solved for $\\hat{\\beta}$:\n", | |
"\n", | |
"\n", | |
"$$\n", | |
"\\hat{\\beta} = {(X^{T}X)}^{-1}{X}^{T}y\n", | |
"$$\n", | |
"\n", | |
"Since $\\hat{\\beta}$ minimizes the sum of squares, $\\hat{\\beta}$ is called the *Least-Squares Estimator.* \n", | |
" \n", | |
" \n", | |
" \n", | |
"### Maximum Likelihood Derivation\n", | |
"\n", | |
"For the Maximum Likelihood derivation, $X$, $y$ and $\\hat{\\beta}$ are the same as described in the Least-Squares derivation, and the model still follows the form\n", | |
"\n", | |
"$$\n", | |
"y = X^{T}\\beta + \\varepsilon\n", | |
"$$ \n", | |
"\n", | |
"but here we assume the $\\varepsilon_{i}$ are $iid$ and follow a zero-mean normal distribution:\n", | |
"\n", | |
"$$\n", | |
"N(\\varepsilon_{i}; 0, \\sigma^{2}) = \\frac{1}{\\sqrt{2\\pi\\sigma^{2}}} e^{- \\frac{(y_{i}-X^{T}\\hat{\\beta})^{2}}{2\\sigma^{2}}}.\n", | |
"$$\n", | |
"\n", | |
"In addition, the responses, $y_{i}$, are each assumed to follow a normal distribution. For $n$ observations, the likelihood function is\n", | |
"\n", | |
"$$\n", | |
"L(\\beta) = \\Big(\\frac{1}{\\sqrt{2\\pi\\sigma^{2}}}\\Big)^{n} e^{-(y-X\\beta)^{T}(y-X\\beta)/2\\sigma^{2}}.\n", | |
"$$\n", | |
"\n", | |
"\n", | |
"$Ln(L(\\beta))$, the Log-Likelihood, is therefore\n", | |
"\n", | |
"\n", | |
"$$\n", | |
"Ln(L(\\beta)) = -\\frac{n}{2}Ln(2\\pi) -\\frac{n}{2}Ln(\\sigma^{2})-\\frac{1}{2\\sigma^{2}}(y-X\\beta)^{T}(y-X\\beta).\n", | |
"$$\n", | |
"\n", | |
"Taking derivatives with respect to $\\beta$ and setting the result equal to zero results in\n", | |
"\n", | |
"\n", | |
"$$\n", | |
"\\frac{\\partial Ln(L(\\beta))}{\\partial \\beta} = -2X^{T}y -2X^{T}X\\beta = 0.\n", | |
"$$\n", | |
"\n", | |
"Upon rearranging and solving for $\\beta$, we obtain\n", | |
"\n", | |
"$$\n", | |
"\\hat{\\beta} = {(X^{T}X)}^{-1}{X}^{T}y,\n", | |
"$$\n", | |
"\n", | |
"which is identical to the result obtained from the Least-Squares approach. \n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.2" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment