matxpg · December 1, 2014 01:13
diff --git a/gistfile1.txt b/gistfile1.txt
 {
 "metadata": {
  "name": "Untitled0"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "import pandas as pd\nimport urllib2\nimport numpy \n\n#Get the assignment data\nlink = 'http://math.usask.ca/~laverty/S245/Assignments/Assignments%20Fall%202012/CompAsst/Asst2Computer2013data.xls'\nsocket = urllib2.urlopen(link)\n\n#Read the excel spreadsheet into a pandas data frame, ignoring the first column of the file as it is redundant.\nxd = pd.ExcelFile(socket)\ndf = xd.parse(xd.sheet_names[-1], header=0, parse_cols = [x for x in range(1, 11)])\n\n\n\n",
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 56
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Matthew Galbraith - mpg317 - 11138371\n-------------------------------------\nSTAT-245-01 (Laverty) - Assignment 2 - Due Oct. 31, 2014\n--------------------------------------------------------\n\n<u>**Compute the correlation between each pair of events**</u>\n\nFrom the data, I created a table showing the correlation of each pair of events at the Olympic decathlon. For the table, one thing to note is that the diagonals in the table are paired with themselves and thus not important. Also, RiCj (Row i Column j) is the same as RjCi - because a pair of events e.g. x1, x2 is the same as the pair x2, x1.\n\nThe values are computed using *Pearson's rank correlation coefficient* **r**.\n\n\n<u>**Determine which events are most highly correlated and which events are least correlated.**</u>\n\nAfter making the table, I found that the events with the highest correlation were x3 and x7 - shotput and discus. They have a correlation\n\ncoefficient value of approximately 0.704722. The events with the lowest correlation were x1 and x10 - 100m and 1500m, with a correlation\n\ncoefficient value of approximately -0.045854 - a slight negative correlation. \n\n\n<u>**Comment.**</u>\n\nThe 100m and 1500m were least correlated, but they were the only pair of events in the set of all pairs of events in the decathlon that had a slight negative correlation. One thing to note is that they were both running events, but where the 100m is a sprint, the 1500m is more of a longer-distance run - sprinters may not have the endurance to perform well in a long distance run, and long distance runners may not have the speed to perform well in short distance sprints. The highest correlated events, shotput and discus, are two similiar sports. Shotput involves throwing a \"shot\" (a spherical ball) and discus involves throwing a disc. This could be why there is a such a positive correlation between performance in the two events, as both events would require similiar upper body training to perform well. Other events are interesting to comment on as well. The second least correlated events,  x9 x10 (Javelin and 1500m) pair with a correlation coefficient of approximately 0.056698 makes sense, as throwing a javelin is very different to running 1500m. The second highest correlated events, x1 x5 (100m and 400m) with a correlation coefficient of approximately 0.643485. These are two shorter distance running events (one is one length of a track, and the other is one quarter of the length of a track). It would make sense that the performance in a short distance running event could be somewhat correlated with the performance of another short distance running event."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "correlation_matrix = df.corr(method='pearson')\ncorrelation_matrix",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>x1</th>\n      <th>x2</th>\n      <th>x3</th>\n      <th>x4</th>\n      <th>x5</th>\n      <th>x6</th>\n      <th>x7</th>\n      <th>x8</th>\n      <th>x9</th>\n      <th>x10</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>x1</th>\n      <td> 1.000000</td>\n      <td> 0.585335</td>\n      <td> 0.280897</td>\n      <td> 0.223010</td>\n      <td> 0.643485</td>\n      <td> 0.329727</td>\n      <td> 0.182300</td>\n      <td> 0.151807</td>\n      <td> 0.117975</td>\n      <td>-0.045854</td>\n    </tr>\n    <tr>\n      <th>x2</th>\n      <td> 0.585335</td>\n      <td> 1.000000</td>\n      <td> 0.322786</td>\n      <td> 0.559354</td>\n      <td> 0.502929</td>\n      <td> 0.560442</td>\n      <td> 0.270514</td>\n      <td> 0.317594</td>\n      <td> 0.207177</td>\n      <td> 0.070008</td>\n    </tr>\n    <tr>\n      <th>x3</th>\n      <td> 0.280897</td>\n      <td> 0.322786</td>\n      <td> 1.000000</td>\n      <td> 0.358365</td>\n      <td> 0.396357</td>\n      <td> 0.234075</td>\n      <td> 0.704722</td>\n      <td> 0.214011</td>\n      <td> 0.523221</td>\n      <td> 0.116284</td>\n    </tr>\n    <tr>\n      <th>x4</th>\n      <td> 0.223010</td>\n      <td> 0.559354</td>\n      <td> 0.358365</td>\n      <td> 1.000000</td>\n      <td> 0.268387</td>\n      <td> 0.548898</td>\n      <td> 0.290969</td>\n      <td> 0.474351</td>\n      <td> 0.129864</td>\n      <td> 0.266634</td>\n    </tr>\n    <tr>\n      <th>x5</th>\n      <td> 0.643485</td>\n      <td> 0.502929</td>\n      <td> 0.396357</td>\n      <td> 0.268387</td>\n      <td> 1.000000</td>\n      <td> 0.315019</td>\n      <td> 0.386719</td>\n      <td> 0.289851</td>\n      <td> 0.403909</td>\n      <td> 0.414685</td>\n    </tr>\n    <tr>\n      <th>x6</th>\n      <td> 0.329727</td>\n      <td> 0.560442</td>\n      <td> 0.234075</td>\n      <td> 0.548898</td>\n      <td> 0.315019</td>\n      <td> 1.000000</td>\n      <td> 0.194883</td>\n      <td> 0.479401</td>\n      <td> 0.184994</td>\n      <td> 0.063186</td>\n    </tr>\n    <tr>\n      <th>x7</th>\n      <td> 0.182300</td>\n      <td> 0.270514</td>\n      <td> 0.704722</td>\n      <td> 0.290969</td>\n      <td> 0.386719</td>\n      <td> 0.194883</td>\n      <td> 1.000000</td>\n      <td> 0.343120</td>\n      <td> 0.393074</td>\n      <td> 0.244814</td>\n    </tr>\n    <tr>\n      <th>x8</th>\n      <td> 0.151807</td>\n      <td> 0.317594</td>\n      <td> 0.214011</td>\n      <td> 0.474351</td>\n      <td> 0.289851</td>\n      <td> 0.479401</td>\n      <td> 0.343120</td>\n      <td> 1.000000</td>\n      <td> 0.153606</td>\n      <td> 0.297925</td>\n    </tr>\n    <tr>\n      <th>x9</th>\n      <td> 0.117975</td>\n      <td> 0.207177</td>\n      <td> 0.523221</td>\n      <td> 0.129864</td>\n      <td> 0.403909</td>\n      <td> 0.184994</td>\n      <td> 0.393074</td>\n      <td> 0.153606</td>\n      <td> 1.000000</td>\n      <td> 0.056698</td>\n    </tr>\n    <tr>\n      <th>x10</th>\n      <td>-0.045854</td>\n      <td> 0.070008</td>\n      <td> 0.116284</td>\n      <td> 0.266634</td>\n      <td> 0.414685</td>\n      <td> 0.063186</td>\n      <td> 0.244814</td>\n      <td> 0.297925</td>\n      <td> 0.056698</td>\n      <td> 1.000000</td>\n    </tr>\n  </tbody>\n</table>\n<p>10 rows \u00d7 10 columns</p>\n</div>",
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 61,
       "text": "           x1        x2        x3        x4        x5        x6        x7  \\\nx1   1.000000  0.585335  0.280897  0.223010  0.643485  0.329727  0.182300   \nx2   0.585335  1.000000  0.322786  0.559354  0.502929  0.560442  0.270514   \nx3   0.280897  0.322786  1.000000  0.358365  0.396357  0.234075  0.704722   \nx4   0.223010  0.559354  0.358365  1.000000  0.268387  0.548898  0.290969   \nx5   0.643485  0.502929  0.396357  0.268387  1.000000  0.315019  0.386719   \nx6   0.329727  0.560442  0.234075  0.548898  0.315019  1.000000  0.194883   \nx7   0.182300  0.270514  0.704722  0.290969  0.386719  0.194883  1.000000   \nx8   0.151807  0.317594  0.214011  0.474351  0.289851  0.479401  0.343120   \nx9   0.117975  0.207177  0.523221  0.129864  0.403909  0.184994  0.393074   \nx10 -0.045854  0.070008  0.116284  0.266634  0.414685  0.063186  0.244814   \n\n           x8        x9       x10  \nx1   0.151807  0.117975 -0.045854  \nx2   0.317594  0.207177  0.070008  \nx3   0.214011  0.523221  0.116284  \nx4   0.474351  0.129864  0.266634  \nx5   0.289851  0.403909  0.414685  \nx6   0.479401  0.184994  0.063186  \nx7   0.343120  0.393074  0.244814  \nx8   1.000000  0.153606  0.297925  \nx9   0.153606  1.000000  0.056698  \nx10  0.297925  0.056698  1.000000  \n\n[10 rows x 10 columns]"
      }
     ],
     "prompt_number": 61
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "#Unstack the table and sort\ndf2 = correlation_matrix.unstack()\ndf2.sort(kind='quicksort')\nprint \"Sorted correlation coefficient values, ignore the last 10 which are pairs of themselves\"\nprint df2",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "Sorted correlation coefficient values, ignore the last 10 which are pairs of themselves\nx10  x1    -0.045854\nx1   x10   -0.045854\nx9   x10    0.056698\nx10  x9     0.056698\nx6   x10    0.063186\nx10  x6     0.063186\nx2   x10    0.070008\nx10  x2     0.070008\nx3   x10    0.116284\nx10  x3     0.116284\nx1   x9     0.117975\nx9   x1     0.117975\nx4   x9     0.129864\nx9   x4     0.129864\nx1   x8     0.151807\n...\nx1   x2     0.585335\nx5   x1     0.643485\nx1   x5     0.643485\nx7   x3     0.704722\nx3   x7     0.704722\nx1   x1     1.000000\nx8   x8     1.000000\nx7   x7     1.000000\nx6   x6     1.000000\nx5   x5     1.000000\nx4   x4     1.000000\nx3   x3     1.000000\nx2   x2     1.000000\nx9   x9     1.000000\nx10  x10    1.000000\nLength: 100, dtype: float64\n"
      }
     ],
     "prompt_number": 71
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "",
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
 }
	{
	"metadata": {
	"name": "Untitled0"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "import pandas as pd\nimport urllib2\nimport numpy \n\n#Get the assignment data\nlink = 'http://math.usask.ca/~laverty/S245/Assignments/Assignments%20Fall%202012/CompAsst/Asst2Computer2013data.xls'\nsocket = urllib2.urlopen(link)\n\n#Read the excel spreadsheet into a pandas data frame, ignoring the first column of the file as it is redundant.\nxd = pd.ExcelFile(socket)\ndf = xd.parse(xd.sheet_names[-1], header=0, parse_cols = [x for x in range(1, 11)])\n\n\n\n",
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 56
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Matthew Galbraith - mpg317 - 11138371\n-------------------------------------\nSTAT-245-01 (Laverty) - Assignment 2 - Due Oct. 31, 2014\n--------------------------------------------------------\n\n<u>Compute the correlation between each pair of events</u>\n\nFrom the data, I created a table showing the correlation of each pair of events at the Olympic decathlon. For the table, one thing to note is that the diagonals in the table are paired with themselves and thus not important. Also, RiCj (Row i Column j) is the same as RjCi - because a pair of events e.g. x1, x2 is the same as the pair x2, x1.\n\nThe values are computed using Pearson's rank correlation coefficient r.\n\n\n<u>Determine which events are most highly correlated and which events are least correlated.</u>\n\nAfter making the table, I found that the events with the highest correlation were x3 and x7 - shotput and discus. They have a correlation\n\ncoefficient value of approximately 0.704722. The events with the lowest correlation were x1 and x10 - 100m and 1500m, with a correlation\n\ncoefficient value of approximately -0.045854 - a slight negative correlation. \n\n\n<u>Comment.</u>\n\nThe 100m and 1500m were least correlated, but they were the only pair of events in the set of all pairs of events in the decathlon that had a slight negative correlation. One thing to note is that they were both running events, but where the 100m is a sprint, the 1500m is more of a longer-distance run - sprinters may not have the endurance to perform well in a long distance run, and long distance runners may not have the speed to perform well in short distance sprints. The highest correlated events, shotput and discus, are two similiar sports. Shotput involves throwing a \"shot\" (a spherical ball) and discus involves throwing a disc. This could be why there is a such a positive correlation between performance in the two events, as both events would require similiar upper body training to perform well. Other events are interesting to comment on as well. The second least correlated events, x9 x10 (Javelin and 1500m) pair with a correlation coefficient of approximately 0.056698 makes sense, as throwing a javelin is very different to running 1500m. The second highest correlated events, x1 x5 (100m and 400m) with a correlation coefficient of approximately 0.643485. These are two shorter distance running events (one is one length of a track, and the other is one quarter of the length of a track). It would make sense that the performance in a short distance running event could be somewhat correlated with the performance of another short distance running event."
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "correlation_matrix = df.corr(method='pearson')\ncorrelation_matrix",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>x1</th>\n <th>x2</th>\n <th>x3</th>\n <th>x4</th>\n <th>x5</th>\n <th>x6</th>\n <th>x7</th>\n <th>x8</th>\n <th>x9</th>\n <th>x10</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>x1</th>\n <td> 1.000000</td>\n <td> 0.585335</td>\n <td> 0.280897</td>\n <td> 0.223010</td>\n <td> 0.643485</td>\n <td> 0.329727</td>\n <td> 0.182300</td>\n <td> 0.151807</td>\n <td> 0.117975</td>\n <td>-0.045854</td>\n </tr>\n <tr>\n <th>x2</th>\n <td> 0.585335</td>\n <td> 1.000000</td>\n <td> 0.322786</td>\n <td> 0.559354</td>\n <td> 0.502929</td>\n <td> 0.560442</td>\n <td> 0.270514</td>\n <td> 0.317594</td>\n <td> 0.207177</td>\n <td> 0.070008</td>\n </tr>\n <tr>\n <th>x3</th>\n <td> 0.280897</td>\n <td> 0.322786</td>\n <td> 1.000000</td>\n <td> 0.358365</td>\n <td> 0.396357</td>\n <td> 0.234075</td>\n <td> 0.704722</td>\n <td> 0.214011</td>\n <td> 0.523221</td>\n <td> 0.116284</td>\n </tr>\n <tr>\n <th>x4</th>\n <td> 0.223010</td>\n <td> 0.559354</td>\n <td> 0.358365</td>\n <td> 1.000000</td>\n <td> 0.268387</td>\n <td> 0.548898</td>\n <td> 0.290969</td>\n <td> 0.474351</td>\n <td> 0.129864</td>\n <td> 0.266634</td>\n </tr>\n <tr>\n <th>x5</th>\n <td> 0.643485</td>\n <td> 0.502929</td>\n <td> 0.396357</td>\n <td> 0.268387</td>\n <td> 1.000000</td>\n <td> 0.315019</td>\n <td> 0.386719</td>\n <td> 0.289851</td>\n <td> 0.403909</td>\n <td> 0.414685</td>\n </tr>\n <tr>\n <th>x6</th>\n <td> 0.329727</td>\n <td> 0.560442</td>\n <td> 0.234075</td>\n <td> 0.548898</td>\n <td> 0.315019</td>\n <td> 1.000000</td>\n <td> 0.194883</td>\n <td> 0.479401</td>\n <td> 0.184994</td>\n <td> 0.063186</td>\n </tr>\n <tr>\n <th>x7</th>\n <td> 0.182300</td>\n <td> 0.270514</td>\n <td> 0.704722</td>\n <td> 0.290969</td>\n <td> 0.386719</td>\n <td> 0.194883</td>\n <td> 1.000000</td>\n <td> 0.343120</td>\n <td> 0.393074</td>\n <td> 0.244814</td>\n </tr>\n <tr>\n <th>x8</th>\n <td> 0.151807</td>\n <td> 0.317594</td>\n <td> 0.214011</td>\n <td> 0.474351</td>\n <td> 0.289851</td>\n <td> 0.479401</td>\n <td> 0.343120</td>\n <td> 1.000000</td>\n <td> 0.153606</td>\n <td> 0.297925</td>\n </tr>\n <tr>\n <th>x9</th>\n <td> 0.117975</td>\n <td> 0.207177</td>\n <td> 0.523221</td>\n <td> 0.129864</td>\n <td> 0.403909</td>\n <td> 0.184994</td>\n <td> 0.393074</td>\n <td> 0.153606</td>\n <td> 1.000000</td>\n <td> 0.056698</td>\n </tr>\n <tr>\n <th>x10</th>\n <td>-0.045854</td>\n <td> 0.070008</td>\n <td> 0.116284</td>\n <td> 0.266634</td>\n <td> 0.414685</td>\n <td> 0.063186</td>\n <td> 0.244814</td>\n <td> 0.297925</td>\n <td> 0.056698</td>\n <td> 1.000000</td>\n </tr>\n </tbody>\n</table>\n<p>10 rows \u00d7 10 columns</p>\n</div>",
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 61,
	"text": " x1 x2 x3 x4 x5 x6 x7 \\\nx1 1.000000 0.585335 0.280897 0.223010 0.643485 0.329727 0.182300 \nx2 0.585335 1.000000 0.322786 0.559354 0.502929 0.560442 0.270514 \nx3 0.280897 0.322786 1.000000 0.358365 0.396357 0.234075 0.704722 \nx4 0.223010 0.559354 0.358365 1.000000 0.268387 0.548898 0.290969 \nx5 0.643485 0.502929 0.396357 0.268387 1.000000 0.315019 0.386719 \nx6 0.329727 0.560442 0.234075 0.548898 0.315019 1.000000 0.194883 \nx7 0.182300 0.270514 0.704722 0.290969 0.386719 0.194883 1.000000 \nx8 0.151807 0.317594 0.214011 0.474351 0.289851 0.479401 0.343120 \nx9 0.117975 0.207177 0.523221 0.129864 0.403909 0.184994 0.393074 \nx10 -0.045854 0.070008 0.116284 0.266634 0.414685 0.063186 0.244814 \n\n x8 x9 x10 \nx1 0.151807 0.117975 -0.045854 \nx2 0.317594 0.207177 0.070008 \nx3 0.214011 0.523221 0.116284 \nx4 0.474351 0.129864 0.266634 \nx5 0.289851 0.403909 0.414685 \nx6 0.479401 0.184994 0.063186 \nx7 0.343120 0.393074 0.244814 \nx8 1.000000 0.153606 0.297925 \nx9 0.153606 1.000000 0.056698 \nx10 0.297925 0.056698 1.000000 \n\n[10 rows x 10 columns]"
	}
	],
	"prompt_number": 61
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "#Unstack the table and sort\ndf2 = correlation_matrix.unstack()\ndf2.sort(kind='quicksort')\nprint \"Sorted correlation coefficient values, ignore the last 10 which are pairs of themselves\"\nprint df2",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "Sorted correlation coefficient values, ignore the last 10 which are pairs of themselves\nx10 x1 -0.045854\nx1 x10 -0.045854\nx9 x10 0.056698\nx10 x9 0.056698\nx6 x10 0.063186\nx10 x6 0.063186\nx2 x10 0.070008\nx10 x2 0.070008\nx3 x10 0.116284\nx10 x3 0.116284\nx1 x9 0.117975\nx9 x1 0.117975\nx4 x9 0.129864\nx9 x4 0.129864\nx1 x8 0.151807\n...\nx1 x2 0.585335\nx5 x1 0.643485\nx1 x5 0.643485\nx7 x3 0.704722\nx3 x7 0.704722\nx1 x1 1.000000\nx8 x8 1.000000\nx7 x7 1.000000\nx6 x6 1.000000\nx5 x5 1.000000\nx4 x4 1.000000\nx3 x3 1.000000\nx2 x2 1.000000\nx9 x9 1.000000\nx10 x10 1.000000\nLength: 100, dtype: float64\n"
	}
	],
	"prompt_number": 71
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "",
	"language": "python",
	"metadata": {},
	"outputs": []
	}
	],
	"metadata": {}
	}
	]
	}