Last active
April 8, 2016 15:21
-
-
Save andychase/adf315f646afa4385e8eeea22c2adf0e to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Machine Learning Assignment 1\n", | |
"\n", | |
"Andy Chase \n", | |
"Brandon Edwards \n", | |
"Daniel Kirkpatrick \n", | |
"\n", | |
"April 9th 2015" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"First, we import [pandas](http://pandas.pydata.org) and [numpy](https://docs.scipy.org/doc/). These are very popular numerical computation libraries for the Python programming language. Matplotlib is also used for a simple plot in question 5." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"%matplotlib inline" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import matplotlib" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import matplotlib.pyplot as plt" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import pandas" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import numpy" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"I converted the data to csv. Here pandas will read the files and import them as [DataFrames][1]. Pandas DataFrames were used because they include headers which makes the results easy to read.\n", | |
"\n", | |
"\n", | |
"[1]: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"d = pandas.read_csv('housing_train.csv')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"test_data = pandas.read_csv('housing_test.csv')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The answers are \"popped\" off (removed and saved). Since there's only one column, Pandas will make these [series][1].\n", | |
"\n", | |
"[1]: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"test_answers = test_data.pop('MEDV')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"y = d.pop('MEDV')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Question 1\n", | |
"\n", | |
"Next we need to add the \"dummy\" column with all ones. This is done in pandas like so:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"d[\"dummy\"] = 1" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"test_data[\"dummy\"] = 1" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Pandas doesn't support inverting \"Dataframes\" (which are kind of like matrices). Therefore to accomplish this we convert into a Numpy matrix and use the numpy inversion function. I figured this all out by just knowing what I wanted to accomplish and googling out to perform this task, reading StackOverflow, etc.\n", | |
"\n", | |
"I made a quick function here to make it clear later what's happening." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"def data_frame_invert(df):\n", | |
" return numpy.linalg.inv(df.as_matrix())" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Question 2\n", | |
"\n", | |
"This performs the maths: $w = (X^T X)^{−1} X^T Y$" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"output_w = data_frame_invert(d.T.dot(d)).dot(d.T.dot(y))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now I convert the output (which ends up being a numpy array), back into pandas so that I can keep the columns names." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"output_w = pandas.Series(output_w, index=d.columns)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"CRIM -0.101137\n", | |
"ZN 0.045894\n", | |
"INDUS -0.002730\n", | |
"CHAS 3.072013\n", | |
"NOX -17.225407\n", | |
"RM 3.711252\n", | |
"AGE 0.007159\n", | |
"DIS -1.599002\n", | |
"RAD 0.373623\n", | |
"TAX -0.015756\n", | |
"PTRATIO -1.024177\n", | |
"B 0.009693\n", | |
"LSTAT -0.585969\n", | |
"dummy 39.584321\n", | |
"dtype: float64" | |
] | |
}, | |
"execution_count": 15, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"output_w" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now I write a quick function to calculate the SSE. The code here is kind of opaque if you've never seen python, so let's break it down line-by-line.\n", | |
"\n", | |
" def get_sse(d, y, output_w):\n", | |
" \n", | |
"This first line just says \"here's a function with variables, d, y, and output_w).\n", | |
"\n", | |
" for answer, (_, training_row) in zip(y, d.iterrows()):\n", | |
" \n", | |
"`d.iterrows()` takes that DataFrame we have that has the training data, and goes through it one row at a time. If we didn't write `iterrows` Pandas would try to go through it column-wise, which isn't what we want.\n", | |
"\n", | |
"Each row is going to be in the form `(index, row_array)`. I don't need to worry about the index, so I used the variable `_` to indicate that I'm planning on ignoring this variable. `_` isn't special, it's a variable like `d`, or `y` , but convention states that you use `_` when you are ignoring something.\n", | |
"\n", | |
"`zip(y, d.iterrows())` takes each row and combines it with the rows in the answer output. Think of like a zipper, the left side of the zipper is the answer rows and the right side is the training data rows.\n", | |
"\n", | |
" predicted_value = training_row.dot(output_w)\n", | |
" yield (answer - predicted_value)**2\n", | |
"\n", | |
"`x**2` in Python means $ x^2 $. The `yield` keyword means this function returns a new kind of list with each answer as the row in the list (it's actually a generator, but imagine it as a list that's generated as it's used)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"def get_sse(d, y, output_w):\n", | |
" for answer, (_, testing_row) in zip(y, d.iterrows()):\n", | |
" predicted_value = testing_row.dot(output_w)\n", | |
" yield (answer - predicted_value)**2" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Question 3\n", | |
"\n", | |
"Next we sum up all the values in that generated list to get the sum of squared errors. The `sum` function is a built-in function in Python that can sum up a list or generator.\n", | |
"\n", | |
"We are using the `test_data` and `test_answers` to calculate the error here, not the training data." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1675.2309659483587" | |
] | |
}, | |
"execution_count": 17, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sum(get_sse(test_data, test_answers, output_w))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Question 4\n", | |
"\n", | |
"Repeat the experiment, but pop off the dummy variable and don't use it this time." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [], | |
"source": [ | |
"_ = d.pop(\"dummy\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"_ = test_data.pop(\"dummy\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"output_w = data_frame_invert(d.T.dot(d)).dot(d.T.dot(y))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"output_w = pandas.Series(output_w, index=d.columns)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1797.625624999007" | |
] | |
}, | |
"execution_count": 22, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sum(get_sse(test_data, test_answers, output_w))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": true | |
}, | |
"source": [ | |
"## Question 5" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"d[\"dummy\"] = 1" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"test_data[\"dummy\"] = 1" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"identity = numpy.identity(len(d.columns))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def get_modifer_sse(modifer_integer):\n", | |
" modifier = modifer_integer*identity\n", | |
" output_w = data_frame_invert(d.T.dot(d) + modifier).dot(d.T.dot(y))\n", | |
" output_w = pandas.Series(output_w, index=d.columns)\n", | |
" return sum(get_sse(test_data, test_answers, output_w))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1661.8917627327075" | |
] | |
}, | |
"execution_count": 27, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"get_modifer_sse(.5)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"x_axis = numpy.arange(0.0, 2.0, 0.01)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"y_axis = [(i, get_modifer_sse(i)) for i in x_axis]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(0.20999999999999999, 1649.5930012875315)" | |
] | |
}, | |
"execution_count": 30, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"min(y_axis, key=lambda _: _[1])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEACAYAAAC6d6FnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XuYVNWV9/HvUiASxWjUCBGRoE0ElYiAEFRoTVR0EsU7\nqGiIowJGZ5KM0eRNBB0TERkjOkq8NYiXRlSCEgE1QgEjQquooKABRQUUBAKCXKTpXu8f+2DKtulL\ndVeduvw+z1MPXfucOrWqnmKvc/beZ29zd0REpPDsFncAIiISDyUAEZECpQQgIlKglABERAqUEoCI\nSIFSAhARKVA1JgAzKzGz1Wa2MKlsvJm9Hj2WmdnrUfnJZvaqmS2I/j0x6TVdzGyhmS0xs1Hp+zgi\nIlJXVtN9AGZ2AvA5MM7dj6pm+0hgg7vfbGZHA6vcfZWZHQE85+6to/3KgF+4e5mZTQHudPdp6fhA\nIiJSNzVeAbj7bGB9ddvMzIDzgdJo3zfcfVW0eRHQ3MyamlkroIW7l0XbxgF9GyN4ERFJXUP6AE4A\nVrv7e9VsOwd4zd3LgYOAFUnbVkZlIiISoyYNeG1/4LGqhVHzz3Dg5AYcW0RE0iylBGBmTYCzgGOq\nlLcGJgID3H1ZVLwSaJ20W+uorLrjamIiEZEUuLvV9zWpNgH9GFjs7h/vLDCzfYBngevc/eWkoD4B\nNppZ96jfYAAwaVcHdnc9GukxdOjQ2GPIl4e+S32fcTx27HASCeeqq5yWLZ3OnZ1bbnGWLv3qfqmq\n8QrAzEqB3sB+ZrYcuMHdxwAXEHX+JvkFcCgw1MyGRmUnu/taYAgwFmgOTHGNABIRqZY7zJ0Ljz0G\nTz4JBx4I558Ps2ZBUVHjvleNCcDd+++ifGA1ZTcDN+9i/9eArw0jFRGRYNEiePRRKC2Fb3wDLrwQ\nZs6E9u3T954N6QSWLFdcXBx3CHlD32Xj0vcZfPQRjB8fzvbXroX+/eGpp+Doo8Hq3aJffzXeCJZp\nZubZFI+ISGNbuzY07Tz2WDjrP+eccLZ/wgmwW4q9smaGp9AJrAQgIpJmW7fC00+HJp5Zs+C000Kl\nf+qpobmnoZQARESyiDvMmQMPPRTO+Lt2hQEDoG9faNGicd8r1QSgPgARkUb00Ucwblx47LYbXHop\nLFgArVvX/tpMUwIQEWmgzZth4kQYOxbeeCMM23z4YTj22Mx05qZKCUBEJAWVlTB7dmji+etfoWdP\nGDQIfvpT2GOPuKOrG/UBiIjUw6pV4Uz/gQdCRT9wIFx0EbRsGV9M6gMQEUmTigp44QW47z6YMSMM\n3Xz00exv4qmNEoCIyC6sWAElJfDgg3DAAXD55eHsf++9446scSgBiIgk2bEDnn0W7r8/DOPs1w8m\nTYLOneOOrPEpAYiIEIZv3ndfOONv2zac7T/+OOy5Z9yRpY8SgIgUrMpKePFFuPvuMKLnoovg+efh\nyCPjjiwzlABEpOBs2BDa8kePDiN5rroqdOrm89l+dZQARKRgvPlmONt/4okwH09JSRi/n8sjeRpC\nCUBE8tr27WGK5bvvhg8/hCuvhMWL4x23ny2UAEQkL61dGzp1774bDj8cfv3rcJduE9V6X9JXISJ5\n5e23YdSo0Mxz9tkwbRocpfUIq6UEICI5r7IyVPR33AELF8KQIfDuu/Cd78QdWXarcf0ZMysxs9Vm\ntjCpbLyZvR49lpnZ61H5fmY2w8w2mdldVY7TxcwWmtkSMxuVno8iIoVm8+YwkqdjR/h//w8uvhg+\n+AD+8AdV/nVR2wJkY4A+yQXu3s/dO7t7Z+Cp6AGwFfg98F/VHGc0cJm7FwFFZtanmn1EROpk9epQ\n4R9ySBi3f++9MH8+XHJJ46ywVShqTADuPhtYX902MzPgfKA02neLu78EfFFlv1ZAC3cvi4rGAX0b\nGLeIFKAlS8IonsMPh/XrYe7cMBVz796FO5SzIVJcghiAE4DV7v5elfKq8zkfBKxIer4yKhMRqZOy\nMjj33DBm/8ADQ/v+PffAYYfFHVlua0gncH/gscYKREQkmTtMnQojRsCyZWEY59ixsNdecUeWP1JK\nAGbWBDgLOKYOu68EklfDbB2VVWvYsGFf/l1cXExxcXEqIYpIjiovh/HjQ8W/227wm9+EJRabNo07\nsuyRSCRIJBINPk6tK4KZWVtgsrsflVTWB7jO3U+sZv+fAV3c/eqksnnANUAZ8Cxwp7tPq+a1WhFM\npEB98UU4wx8+PMzGef31cMopatuvi7SsCGZmpUBvYD8zWw7c4O5jgAuIOn+r7P8B0AJoZmZ9gZPd\n/R1gCDAWaA5Mqa7yF5HCtGVLWF5xxAjo1AkeeQSOOy7uqAqD1gQWkVhs2gR/+Qvcfjv06BGGdXbt\nGndUuUlrAotITtiwAe66KzxOOimM49dUDfFoyDBQEZE6W7cOfv/7MHTzvffCAizjx6vyj5MSgIik\n1YYNMHQofP/74Q7esrLQ2fv978cdmSgBiEhabNoEf/wjFBWF9XbLysJC6+3axR2Z7KQEICKNassW\nuO220NSzaBG89BKMGaOKPxupE1hEGsW2bWFStuHD4fjjYfp0OOKIuKOSmigBiEiDbN8ODz4YmnuO\nOSZM33D00XFHJXWhBCAiKamshAkTwvj9ww6DiRPh2GPjjkrqQwlAROrt73+H664Lc/Xcf38Yzy+5\nRwlAROps/vwwR8+yZfCnP4UpmjVXT+7SKCARqdX778OFF8K//RucdVYY3XPeear8c50SgIjs0qef\nwjXXQLdu0KFDWJFr8GBNzZwvlABE5Gu2bg1NPB07hrP8xYvDQutajCW/qA9ARL7kDo8/Htr5u3aF\nefPg0EPjjkrSRQlARIAwVcMvfxnO/h96KCy0LvlNTUAiBW7FChgwIHTu/vu/wyuvqPIvFEoAIgVq\n82YYNgx+8AM45BB4910YOBB23z3uyCRT1AQkUmAqK+HRR+F3vwtz9syfHxKAFB4lAJEC8tpr8Itf\nQEVF6Ozt2TPuiCRONTYBmVmJma02s4VJZePN7PXosczMXk/a9lszW2Jm75jZKUnlXcxsYbRtVHo+\niojsyrp1MGhQuJHriitg7lxV/lJ7H8AYoE9ygbv3c/fO7t4ZeCp6YGYdgQuAjtFr7jH78j7B0cBl\n7l4EFJnZV44pIulRURGmaO7YMdy8tXhxaOffTb1/Qi1NQO4+28zaVrctqtzPB06Mis4ESt29HPjA\nzJYC3c3sQ6CFu5dF+40D+gLTGh6+iOzKvHlw1VXQvHlYeP0HP4g7Isk2DTkPOAFY7e7vRc+/C6xI\n2r4COKia8pVRuYikwZo1cNllYVjnf/4nzJqlyl+q15BO4P7AY40VyE7Dhg378u/i4mKKi4sb+y1E\n8lJlJZSUhNE9F18M77wDe+8dd1SSDolEgkQi0eDjmLvXvENoAprs7kcllTUhnNUf4+4fR2XXA7j7\n8Oj5NGAo8CEww907ROX9gd7uPqia9/La4hGRr3v7bbjyStixI7T564y/sJgZ7l7vuVlTbQL6MbB4\nZ+UfeQboZ2bNzOx7QBFQ5u6rgI1m1j3qNxgATErxfUUkydat4Yy/uBguuigswK7KX+qqtmGgpcAc\noL2ZLTezgdGmC4DS5H3dfREwAVgETAWGJJ3ODwEeAJYAS91dHcAiDfTcc3DkkWGu/gULwjTNuotX\n6qPWJqBMUhOQSO1WrQqTts2bB3ffDaedFndEErdMNwGJSIa5w4MPQqdOYeqGt95S5S8No6kgRHLA\n+++HO3g3bAgLsnfqFHdEkg90BSCSxSoq4I474Nhj4ZRTwhQOqvylsegKQCRLLVoUbuhq2hTmzIH2\n7eOOSPKNrgBEskx5Odx8M/TqFRZqSSRU+Ut66ApAJIvMnw8//zm0ahX+btMm7ogkn+kKQCQLlJfD\njTdCnz5hiOeUKar8Jf10BSASs0WL4JJLYP/9w1l/69ZxRySFQlcAIjGpqICRI0Nb/xVXwNSpqvwl\ns3QFIBKD996Dn/0MzKCsDNq1izsiKUS6AhDJIHcYPRq6d4ezzw4jfFT5S1x0BSCSIR9/HJZjXL8e\nZs+GDh3ijkgKna4ARDJg0iQ45hj44Q/DTV2q/CUb6ApAJI02b4Zf/QpeeAEmToSePeOOSORfdAUg\nkiavvRbO+rdtgzfeUOUv2UcJQKSRVVTArbeGqZpvugkeekhr80p2UhOQSCNavjzc1FVZCa++qrt5\nJbvpCkCkkUycCF27wqmnwvTpqvwl++kKQKSBtm2Da6+FZ5+FyZPD3P0iuaC2ReFLzGy1mS2sUn61\nmS02s7fM7NaorJmZjTGzBWb2hpn1Ttq/i5ktNLMlZjYqPR9FJPOWLAmdu6tWhXl8VPlLLqmtCWgM\n0Ce5wMxOBM4AOrn7kcDIaNPlQKW7dwJOBv4n6WWjgcvcvQgoMrOvHFMkFz32WKj8L78cJkyAffaJ\nOyKR+qmxCcjdZ5tZ2yrFg4Fb3L082mdNVN4BmLGzzMw2mFk3YAXQwt3Lov3GAX2BaY3yCUQybMsW\nuOaacDfvCy/A0UfHHZFIalLpBC4CepnZXDNLmFnXqPxN4Awz293Mvgd0AVoDBxGSwE4rozKRnPP2\n26GZZ9u2MMpHlb/kslQ6gZsA+7p7j+gMfwLQDighXAW8CnwIzAEqAK/PwYcNG/bl38XFxRQXF6cQ\nokjjGzs2dPaOGPGvmTxF4pBIJEgkEg0+jrnXXD9HTUCT3f2o6PlUYLi7z4yeLwW6u/u6Kq97CbgM\n+AyY7u4dovL+QG93H1TNe3lt8Yhk2rZt/2ryefJJOOKIuCMS+Sozw93rfUqSShPQJOCk6E3bA83c\nfZ2ZNTezPaPyk4Fyd3/H3T8BNppZdzMzYEB0DJGst2wZHHccbNgQ5u1X5S/5pLZhoKWEppz2Zrbc\nzAYSmnraRUNDS4FLot0PBF4zs0XAtYSKfqchwAPAEmCpu6sDWLLelCnQowcMGACPPw4tWsQdkUjj\nqrUJKJPUBCTZoKIiLNBeUgLjx8Pxx8cdkUjNUm0C0p3AIknWroWLLoLt28Mon5Yt445IJH00F5BI\npKwMunQJQztfeEGVv+Q/XQGIEJp7rr8e7r0Xzjor7mhEMkMJQApaeTn88pfhjH/WLDj88LgjEskc\nJQApWGvWwHnnwV57heafb30r7ohEMkt9AFKQ5s+Hbt3CCJ+nn1blL4VJVwBScEpLw529o0fDuefG\nHY1IfJQApGBUVMBvfxumc3jxRejUKe6IROKlBCAFYf166NcvJIFXXoH99os7IpH4qQ9A8t6SJWFK\nh44dYdo0Vf4iOykBSF6bMSN09P7Xf8Gf/wxNdM0r8iUlAMlbDzwQmn1KS8OyjSLyVTofkrxTUQG/\n+Q1Mnhzm8G/fPu6IRLKTEoDklU2b4MILYfNmmDsXvv3tuCMSyV5qApK88eGHYfGWVq3guedU+YvU\nRglA8sLcufDDH8LAgWFCt6ZN445IJPupCUhy3lNPwaBBMGYM/OQncUcjkjuUACSn3XEHjBwZmnyO\nOSbuaERyixKA5KSKCvjVr+Dvf4eXXoJDDok7IpHcU9ui8CVmtjpaAD65/GozW2xmb5nZrVHZHmZW\namYLzGyRmV2ftH8XM1toZkvMbFR6PooUii1bwiRuCxeq8hdpiNo6gccAfZILzOxE4Aygk7sfCYyM\nNvUDcPdOQBfgSjNrE20bDVzm7kVAkZl95ZgidfXpp3DSSWEO/2nTYJ994o5IJHfVmADcfTawvkrx\nYOAWdy+P9lkTlX8C7GlmuwN7AtuBjWbWCmjh7mXRfuOAvo0UvxSQf/wDevaEk0+GceOgWbO4IxLJ\nbakMAy0CepnZXDNLmFlXAHd/DthISAQfALe5+wbgIGBF0utXRmUidfbSS9CrV5jO+b//G8zijkgk\n96XSCdwE2Nfde5hZN2AC0M7MLgaaA62AbwOzzezF+h582LBhX/5dXFxMcXFxCiFKPvnrX+GKK+CR\nR+DUU+OORiR+iUSCRCLR4OOYu9e8g1lbYLK7HxU9nwoMd/eZ0fOlQA/gJmCOuz8SlT8ITAX+D5jh\n7h2i8v5Ab3cfVM17eW3xSGG591648Ub42980zFNkV8wMd6/3dXEqTUCTgJOiN20PNHX3tcA7SeV7\nEpLCO+6+itAX0N3MDBgQHUNkl9zhpptgxAiYNUuVv0g61NgEZGalQG9gPzNbDtwAlAAl0dDQ7cCl\n0e73Ag9G5bsBJe7+VrRtCDCW0EQ0xd2nNfYHkfxRURHW7J0zJ7T9t2wZd0Qi+anWJqBMUhOQbNsG\nAwbAunUwaRLsvXfcEYlkv0w2AYmkxWefwWmnhb+nTlXlL5JuSgCSFVatguJiOOIIGD8evvGNuCMS\nyX9KABK7pUvDPP7nnAN33QW77x53RCKFQZPBSawWLoQ+fWDo0DDWX0QyRwlAYjNvHpxxBtx5J1xw\nQdzRiBQeJQCJxfTp0K8fjB0Lp58edzQihUl9AJJxzzwTKv8nnlDlLxInJQDJqEcfDW39U6ZA795x\nRyNS2NQEJBlzzz1wyy2h+adjx7ijERElAMmI4cPh/vth5kxo1y7uaEQElAAkzdzDHP6TJ8Ps2fDd\n78YdkYjspAQgaVNZCVddBa++Gs78998/7ohEJFnWdQKvr7oApeSkigoYOBAWLYIXX1TlL5KNsi4B\nLFgQdwTSUDt2wKWXwsqVmtRNJJtlXQJ48824I5CG2LEDLrkEPv00tPt/85txRyQiu5J1fQBKALlr\nxw64+GLYsAGefhqaN487IhGpiRKANIrycrjoIti0KSzksscecUckIrXJuhXBvvlN57PPoEnWpSbZ\nlfJy6N8ftm6Fp55S5S+SaXmzIlibNmHkiOSG7dvDTJ5ffAETJ6ryF8klNSYAMysxs9XRQu/J5Veb\n2WIze8vMhkdlF5nZ60mPCjPrFG3rYmYLzWyJmY2q6T27dg3jxiX7bd8O558fhnw++aRW8RLJNbVd\nAYwB+iQXmNmJwBlAJ3c/EvgfAHd/1N07u3tnYACwzN13DuocDVzm7kVAkZl95ZjJunWDV15J7cNI\n5nzxBZx7LpiFWT1V+YvknhoTgLvPBqremjUYuMXdy6N91lTz0guBUgAzawW0cPeyaNs4oO+u3lNX\nANlv27awfGPTpjBhAjRrFndEIpKKVPoAioBeZjbXzBJm1rWafc4nSgDAQcCKpG0ro7JqHX00vP12\nOMOU7LNtG5x9dhjiOX58SAIikptSGWvTBNjX3XuYWTdgAvDl/I5m1h3Y4u4pdeWOGDGMFi3g6qvh\nwguLKS4uTuUwkgZbt8JZZ8G3vgWPPKLKXyQuiUSCRCLR4OPUOgzUzNoCk939qOj5VGC4u8+Mni8F\nurv7uuj5n4HV7r6zc7gVMN3dO0TP+wO93X1QNe/l7s5ll4W+gEFf20PisnUrnHkm7LcfPPywhumK\nZJNMDgOdBJwUvWl7oFlS5b8bcB4wfufO7v4JsNHMupuZETqIJ9X0Bl27qiM4m2zZAj/9KRxwgCp/\nkXxS2zDQUmAO0N7MlpvZQKAEaBcNDS0FLkl6SS/gI3f/oMqhhgAPAEuApe4+rab37d4d5s6t1+eQ\nNNm8GX7yE2jVCsaNU+Uvkk+y7k5gd2fHDth3X/joo/CvxGNn5d+mDZSUwO67xx2RiFQnb+4EhnCW\neeyxugqI0+efw+mnQ9u2qvxF8lVWJgCAnj3hpZfijqIwbdoEp50GRUXw4IOq/EXyVVYngDlz4o6i\n8GzcGCr/Dh3gvvtgt6z9hYhIQ2VlHwCEpSHbtAn/quMxMzZuhD59oFMnuOceVf4iuSKv+gAgdP4e\ncoiWiMyUzz6DU08Nd2Kr8hcpDFn93/y442DWrLijyH8bNsApp4T7L+6+W5W/SKHI6v/qJ54IM2bE\nHUV+W78eTj4ZevSAO+8Ms3uKSGHI2j4AgFWrQmfk2rUaiZIO//xnqPx79YLbb1flL5Kr8q4PAKBl\ny3AH6htvxB1J/lm3Dn78YyguVuUvUqiyOgGAmoHSYe1a+NGPQgIYOVKVv0ihyvoEcNJJMH163FHk\njzVrQuXfpw/ceqsqf5FCltV9ABDOVg89NPyr+ecb5tNPQ+V/xhlw882q/EXyRV72AQDsvz8cdhi8\n/HLckeS21avD1VTfvqr8RSTI+gQAYWqCqVPjjiJ3rVoV+lLOOQduukmVv4gESgB57pNPQuV/wQVw\n442q/EXkX7K+DwBgxw74znfgrbfgu9+NIbAc9fHHofK/+GL4wx/ijkZE0iVv+wAgTAZ38skwrcZ1\nxCTZypVhjP+ll6ryF5Hq5UQCgNAMNGVK3FHkhhUrQuX/85/D734XdzQikq1yogkIwvj1ww4LHZrN\nm2c4sByyfHlo9rnySrj22rijEZFMSEsTkJmVmNnqaAH45PKrzWyxmb1lZrcmlXcys5ej8gVm1iwq\n72JmC81siZmNqm+QAAccAMccA88/n8qrC8MHH4Qz/8GDVfmLSO1qawIaA/RJLjCzE4EzgE7ufiQw\nMipvAjwMXBGV9wZ2RC8bDVzm7kVAkZl95Zh1dfbZMHFiKq/Mf++9Fyr///gP+PWv445GRHJBjQnA\n3WcD66sUDwZucffyaJ81UfkpwAJ3XxiVr3f3SjNrBbRw97Jov3FA31SC7dsXJk+G8vJUXp2/3n03\nVP7XXw/XXBN3NCKSK1LpBC4CepnZXDNLmFnXpHI3s2lm9pqZ7WyEOAhYkfT6lVFZvR18cFioXJPD\n/cuiReEO3xtvhEGD4o5GRHJJKqvtNgH2dfceZtYNmAC0A5oCxwNdga3Ai2b2GvBZfQ4+bNiwL/8u\nLi6muLj4K9svuABKS8MKVoVuwYIwqduIEWGsv4gUhkQiQSKRaPBxah0FZGZtgcnuflT0fCow3N1n\nRs+XAj2AHwGnufvPovLfA9uAR4AZ7t4hKu8P9Hb3r52v1jQKaKdPPoGOHcM4929+s+4fNN/Mnw+n\nnw6jRoWkKCKFK5M3gk0CToretD3QzN3XAs8DR5lZ86hDuDfwtruvAjaaWXczM2BAdIyUtGoFxx4L\nzzyT6hFyX1lZuC/inntU+YtI6mobBloKzAHam9lyMxsIlADtoqGhpcAlEDp9gduBV4DXgdfcfecM\nPkOAB4AlwFJ3b9A9vRdfDI880pAj5K45c+AnP4EHHgijokREUpUzN4Il+/zz0CG8aFG4IigU06dD\nv34wblxo+xcRgTyfC6iqvfaC886DkpK4I8mcSZNC5f/EE6r8RaRx5OQVAIRO0LPOgvffh913T3Ng\nMRs7Fn77W/jb36BLl7ijEZFsU1BXABCmhWjZMv/XCbjjDhg6NNz7oMpfRBpTziYACDc+3X133FGk\nhzvccAOMHg2zZ8Phh8cdkYjkm5xtAgLYtg3atYPnnoOjjkpjYBlWWRmmdJgzJ6yB8J3vxB2RiGSz\ngmsCAthjj1BRjhgRdySN54svwjDXBQtCs48qfxFJl5y+AgDYsAEOPTR0Ch9ySJoCy5D160PH9re/\nDY8+qnUPRKRuCvIKAGCffeDyy+GPf4w7kob54AM47jjo3DkM9VTlLyLplvNXAAD//Ce0bx/azNu3\nT0Ngafbqq3DmmXDddZrOWUTqL9UrgLxIAAB/+hO8+SY8/ngjB5VmkyeHtXvvuy80/4iI1FfBJ4DN\nm8PZ/xNPQM+ejRxYGrjDLbeEYaxPPQU9esQdkYjkqlQTQCrrAWSlPfeEkSPDerivvQZNsviTff45\n/OxnsGJFmNnzoJSWxxERaZic7wRO1q9fWDz+rrvijmTX3nsPfvhD+Na3YOZMVf4iEp+8aQLa6R//\nCE1A//d/2Xf37PPPw4ABYWqHwYPB6n3BJiLydQXfB5DsL3+B+++Hl1+GZs0aIbAGKi8P0zo8/DA8\n9hj06hV3RCKST5QAkrhD377Qpk38zUHvvw/9+8P++4dZPQ84IN54RCT/FOyNYNUxg4cegr//He69\nN54Y3MOqXT16hATwt7+p8heR7JLFY2UaZp99whj7448P00afeWbm3nvZsnB38mefwYsv5tdEdSKS\nP/LyCmCnww6DZ5+FK67IzCLyX3wBt90G3brBqaeGPghV/iKSrWpbFL7EzFZHC8Anl19tZovN7C0z\nuzUqa2tmW83s9ehxT9L+XcxsoZktMbNR6fko1evSJTS/XHkl/O//puc93OHJJ6FDhzB3/8svw7XX\nZve9CCIiNXYCm9kJwOfAOHc/Kio7EfgdcLq7l5vZAe6+xszaApN37lflOGXAL9y9zMymAHe6+7Rq\n9muUTuDqLFsGP/1pWEls1CjYd9+GH7OyMiSX4cPDnci33w4/+lHDjysiUh9p6QR299nA+irFg4Fb\n3L082mdNLYG1Alq4e1lUNA7oW99AG+p734N582DvvUOzzNixUFGR2rE2bQodvEccATfeGCZwmz9f\nlb+I5JZU+gCKgF5mNtfMEmbWNWnb96Lmn4SZHR+VHQSsSNpnZVSWcXvuGZqBJkyAkpJwo9htt8FH\nH9X+2vXrwxj+s86C1q3h6afDPD6vvhruQM73helFJP+k0krdBNjX3XuYWTdgAtAO+Bg42N3Xm9kx\nwCQzO6K+Bx82bNiXfxcXF1NcXJxCiDXr2TNMw/Dyy+FMvmvXMDXDD34QrhT23js072zcGObpX7Qo\nzNvTqxecd15IHo3RhCQikopEIkEikWjwcWq9Eaxq276ZTQWGu/vM6PlSoLu7r6vyuhnAr4FPgOnu\n3iEq7w/0dvdB1bxX2voAalJZCW+/HR4ffhiaeMxCIjjkkHCl0LGjOnVFJDtlcjbQScBJwEwzaw80\nc/d1ZrY/sN7dK8ysHaGp6H1332BmG82sO1AGDADuTOF902a33UK/gIZsikghqTEBmFkp0BvYz8yW\nAzcAJUBJNDR0O3BJtHsv4CYzKwcqgSvdfUO0bQgwFmgOTKluBJCIiGRWXs4FJCJSSDQXkIiI1IsS\ngIhIgVICEBEpUEoAIiIFSglARKRAKQGIiBQoJQARkQKlBCAiUqCUAERECpQSgIhIgVICEBEpUEoA\nIiIFSglARKRAKQGIiBQoJQARkQKlBCAiUqCUAERECpQSgIhIgaoxAZhZiZmtjtb/TS6/2swWm9lb\nZnZrlW1sYFxFAAAD4UlEQVRtzOxzM/t1UlkXM1toZkvMbFTjfgQREUlFbVcAY4A+yQVmdiJwBtDJ\n3Y8ERlZ5ze3As1XKRgOXuXsRUGRmfZC0SyQScYeQN/RdNi59n9mhxgTg7rOB9VWKBwO3uHt5tM+a\nnRvMrC/wPrAoqawV0MLdy6KicUDfhocutdF/ssaj77Jx6fvMDqn0ARQBvcxsrpklzKwrgJntBfwG\nGFZl/4OAFUnPV0ZlIiISoyYpvmZfd+9hZt2ACUA7QsX/Z3ffYmbWiDGKiEg6uHuND6AtsDDp+VSg\nd9LzpcD+wCxgWfRYD6wDhgAtgcVJ+/cH/rKL93I99NBDDz3q/6itLq/ukcoVwCTgJGCmmbUHmrn7\nWqDXzh3MbCiwyd3viZ5vNLPuQBkwALizugO7u64cREQypMYEYGalQG9gPzNbDtwAlAAl0dDQ7cAl\ndXifIcBYoDkwxd2nNSRoERFpOIuaXkREpMDEciewmfUxs3eiG8Ou28U+d0bb3zSzzpmOMVfU9l2a\nWbGZfWZmr0eP38cRZy7Y1Y2PVfbR77KOavs+9dusOzM72MxmmNnb0Q241+xiv/r9PlPpOGjIA9id\n0HHcFmgKvAF0qLLP6YSmIoDuwNxMx5kLjzp+l8XAM3HHmgsP4ASgM0mDHqps1++ycb9P/Tbr/l22\nBI6O/t4LeLcx6s04rgCOBZa6+wcebiYbD5xZZZ8zgIcA3H0esI+ZHZjZMHNCXb5LAHWu14FXf+Nj\nMv0u66EO3yfot1kn7r7K3d+I/v4cWAx8t8pu9f59xpEADgKWJz1fwddvDKtun9ZpjisX1eW7dKBn\ndEk4xcw6Ziy6/KPfZePSbzMFZtaWcGU1r8qmev8+UxkG2lB17XWuemag3uqvq8t3Mh842MMNeqcR\nhvG2T29YeU2/y8aj32Y9RTMuPAn8R3Ql8LVdqjyv8fcZxxXASuDgpOcH89WpIqrbp3VUJl9V63fp\n7pvcfUv091SgqZl9O3Mh5hX9LhuRfpv1Y2ZNgaeAR9x9UjW71Pv3GUcCeJUwI2hbM2sGXAA8U2Wf\nZ4juLzCzHsAGd1+d2TBzQq3fpZkduHNqDjM7ljD095+ZDzUv6HfZiPTbrLvoe3oQWOTud+xit3r/\nPjPeBOTuO8zsF8BzhFEsD7r7YjO7Mtp+r7tPMbPTzWwpsBkYmOk4c0FdvkvgXGCwme0AtgD9Ygs4\nyyXd+Lh/dOPjUMLoKv0uU1Db94l+m/VxHHAxsMDMXo/Kfge0gdR/n7oRTESkQGlJSBGRAqUEICJS\noJQAREQKlBKAiEiBUgIQESlQSgAiIgVKCUBEpEApAYiIFKj/D3SXgS8jUbMGAAAAAElFTkSuQmCC\n", | |
"text/plain": [ | |
"<matplotlib.figure.Figure at 0x10ab0ff28>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"plot = plt.plot(\n", | |
" x_axis, \n", | |
" [i[1] for i in y_axis],\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"X-Axis: The indentiy matrix coefficent \n", | |
"Y-Axis: SSE output of corresponding weight vector\n", | |
"\n", | |
"As $\\lambda$ approaches 1, the error rate goes up, possibly indicating that the model is to generalized; while too close to 0 and the model appears to be too complex (overfitting). The best $\\lambda$ value appears to be approximately .21." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Question 6" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"modifier = 10*identity\n", | |
"output_w = data_frame_invert(d.T.dot(d) + modifier).dot(d.T.dot(y))\n", | |
"output_w = pandas.Series(output_w, index=d.columns)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"CRIM -0.098055\n", | |
"ZN 0.051645\n", | |
"INDUS -0.021914\n", | |
"CHAS 2.497114\n", | |
"NOX 0.009656\n", | |
"RM 5.449644\n", | |
"AGE 0.000539\n", | |
"DIS -1.018682\n", | |
"RAD 0.238522\n", | |
"TAX -0.013011\n", | |
"PTRATIO -0.403960\n", | |
"B 0.016906\n", | |
"LSTAT -0.514954\n", | |
"dummy 2.732832\n", | |
"dtype: float64" | |
] | |
}, | |
"execution_count": 33, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"output_w" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The weight values get weaker (closer to 0) in general proportial to what they were before." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Question 7" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In problem 6) we observed that the optimal w (for the modified objective) decreases in length as lambda increases. In fact one can show using matrix norms that the expression for this optimal w (given in prob 5) has length that goes as (is asymptotic to) $\\lambda^{-1}$. Therefore this optimal w goes to zero as lambda goes to infinity.\n", | |
"\n", | |
"We can also see suggestions of this from the modified objective function itself without explicitly solving for the optimal w as a function of lambda. Generally speaking this modified objective function penalizes for large w lengths with this penalty increasing in severity as lambda increases (holding w, x, and y constant). More precisely the partial derivative with respect to lambda is given as $|w|^2$. This implies that objective function values at w's other than the optimal w have no chance of becoming minimal at larger lambda values except for when they do not exceed the current optimal w in length. Indeed the w must get shorter as we see from the explicit form in problem 5)." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment