Last active
August 29, 2015 13:56
-
-
Save aksiksi/9280508 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "heading", | |
"level": 1, | |
"metadata": {}, | |
"source": [ | |
"Python Workshop: Timing Optimization" | |
] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": {}, | |
"source": [ | |
"Outline" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The general idea behind this notebook is to determine the best time for the Python workshop I'll be presenting this semester (Spring 2014).\n", | |
"\n", | |
"The libraries used:\n", | |
"\n", | |
"* [cPickle](http://docs.python.org/2/library/pickle.html#module-cPickle)\n", | |
"* [pandas](http://pandas.pydata.org/)\n", | |
"* [datetime](http://docs.python.org/2/library/datetime.html)\n", | |
"\n", | |
"I'll be using a pre-formatted dump of all UAEU courses available this semester. The course dump will be parsed into a `DataFrame` for easier processing.\n", | |
"\n", | |
"The final output will be a `pandas` `DataFrame` containing the candidate times, in order of least amount of courses at that time." | |
] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": {}, | |
"source": [ | |
"Implementation" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import cPickle as pickle\n", | |
"import pandas as pd\n", | |
"from datetime import datetime" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's first load up the course dump from the `pickle` file." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"with open('classes.pickle') as f:\n", | |
" course_dump = pickle.load(f)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Next, we compile a list of all of the sections that are both for boys and are at the undergraduate level." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"section_list = []\n", | |
"\n", | |
"for courses in course_dump.values():\n", | |
" for sections in courses.values():\n", | |
" for section in sections.values():\n", | |
" # Get the first digit of course code to determine level\n", | |
" try:\n", | |
" course_level = int(section['code'][0])\n", | |
" except:\n", | |
" continue\n", | |
" \n", | |
" # Take a section only if it is for boys and is undergrad level\n", | |
" if section['gender'] == 'B' and course_level < 6 and section['time'] != 'TBA':\n", | |
" section_list.append(section.values())\n", | |
"\n", | |
"columns = section.keys()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 16 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Using `section_list`, we create a `DataFrame` to represent the sections and view its tail." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df = pd.DataFrame(data=section_list, columns=columns)\n", | |
"df.tail()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>status</th>\n", | |
" <th>total</th>\n", | |
" <th>code</th>\n", | |
" <th>title</th>\n", | |
" <th>gender</th>\n", | |
" <th>section</th>\n", | |
" <th>time</th>\n", | |
" <th>days</th>\n", | |
" <th>current</th>\n", | |
" <th>credits</th>\n", | |
" <th>abbrev</th>\n", | |
" <th>location</th>\n", | |
" <th>attribute</th>\n", | |
" <th>duration</th>\n", | |
" <th>instructor</th>\n", | |
" <th>crn</th>\n", | |
" <th>remaining</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>827</th>\n", | |
" <td> NR</td>\n", | |
" <td> 27</td>\n", | |
" <td> 220</td>\n", | |
" <td> Engineering thermodynamics</td>\n", | |
" <td> B</td>\n", | |
" <td> 03</td>\n", | |
" <td> 12:00 pm-01:50 pm</td>\n", | |
" <td> TU</td>\n", | |
" <td> 23</td>\n", | |
" <td> 3.000</td>\n", | |
" <td> GENG</td>\n", | |
" <td> F3 238</td>\n", | |
" <td> English</td>\n", | |
" <td> 02/16-06/19</td>\n", | |
" <td> Naeema I. Al Darmaki (P)</td>\n", | |
" <td> 21864</td>\n", | |
" <td> 4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>828</th>\n", | |
" <td> C</td>\n", | |
" <td> 30</td>\n", | |
" <td> 220</td>\n", | |
" <td> Engineering thermodynamics</td>\n", | |
" <td> B</td>\n", | |
" <td> 01</td>\n", | |
" <td> 12:00 pm-01:50 pm</td>\n", | |
" <td> MW</td>\n", | |
" <td> 30</td>\n", | |
" <td> 3.000</td>\n", | |
" <td> GENG</td>\n", | |
" <td> F3 238</td>\n", | |
" <td> English</td>\n", | |
" <td> 02/16-06/19</td>\n", | |
" <td> Mohammad O. Hamdan (P)</td>\n", | |
" <td> 21862</td>\n", | |
" <td> 0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>829</th>\n", | |
" <td> NR</td>\n", | |
" <td> 27</td>\n", | |
" <td> 220</td>\n", | |
" <td> Engineering thermodynamics</td>\n", | |
" <td> B</td>\n", | |
" <td> 02</td>\n", | |
" <td> 08:00 am-09:50 am</td>\n", | |
" <td> TU</td>\n", | |
" <td> 26</td>\n", | |
" <td> 3.000</td>\n", | |
" <td> GENG</td>\n", | |
" <td> F3 238</td>\n", | |
" <td> English</td>\n", | |
" <td> 02/16-06/19</td>\n", | |
" <td> Naeema I. Al Darmaki (P)</td>\n", | |
" <td> 21863</td>\n", | |
" <td> 1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>830</th>\n", | |
" <td> C</td>\n", | |
" <td> 25</td>\n", | |
" <td> 240</td>\n", | |
" <td> Statics</td>\n", | |
" <td> B</td>\n", | |
" <td> 02</td>\n", | |
" <td> 02:00 pm-03:50 pm</td>\n", | |
" <td> MW</td>\n", | |
" <td> 25</td>\n", | |
" <td> 3.000</td>\n", | |
" <td> GENG</td>\n", | |
" <td> F3 036</td>\n", | |
" <td> \u00a0</td>\n", | |
" <td> 02/16-06/19</td>\n", | |
" <td> Said A. Elkhouly (P)</td>\n", | |
" <td> 20131</td>\n", | |
" <td> 0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>831</th>\n", | |
" <td> NR</td>\n", | |
" <td> 25</td>\n", | |
" <td> 240</td>\n", | |
" <td> Statics</td>\n", | |
" <td> B</td>\n", | |
" <td> 01</td>\n", | |
" <td> 04:00 pm-05:50 pm</td>\n", | |
" <td> TU</td>\n", | |
" <td> 22</td>\n", | |
" <td> 3.000</td>\n", | |
" <td> GENG</td>\n", | |
" <td> F1 2126</td>\n", | |
" <td> \u00a0</td>\n", | |
" <td> 02/16-06/19</td>\n", | |
" <td> Hamad A. Al Jassmi (P)</td>\n", | |
" <td> 20130</td>\n", | |
" <td> 3</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>5 rows \u00d7 17 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 17, | |
"text": [ | |
" status total code title gender section \\\n", | |
"827 NR 27 220 Engineering thermodynamics B 03 \n", | |
"828 C 30 220 Engineering thermodynamics B 01 \n", | |
"829 NR 27 220 Engineering thermodynamics B 02 \n", | |
"830 C 25 240 Statics B 02 \n", | |
"831 NR 25 240 Statics B 01 \n", | |
"\n", | |
" time days current credits abbrev location attribute \\\n", | |
"827 12:00 pm-01:50 pm TU 23 3.000 GENG F3 238 English \n", | |
"828 12:00 pm-01:50 pm MW 30 3.000 GENG F3 238 English \n", | |
"829 08:00 am-09:50 am TU 26 3.000 GENG F3 238 English \n", | |
"830 02:00 pm-03:50 pm MW 25 3.000 GENG F3 036 \u00a0 \n", | |
"831 04:00 pm-05:50 pm TU 22 3.000 GENG F1 2126 \u00a0 \n", | |
"\n", | |
" duration instructor crn remaining \n", | |
"827 02/16-06/19 Naeema I. Al Darmaki (P) 21864 4 \n", | |
"828 02/16-06/19 Mohammad O. Hamdan (P) 21862 0 \n", | |
"829 02/16-06/19 Naeema I. Al Darmaki (P) 21863 1 \n", | |
"830 02/16-06/19 Said A. Elkhouly (P) 20131 0 \n", | |
"831 02/16-06/19 Hamad A. Al Jassmi (P) 20130 3 \n", | |
"\n", | |
"[5 rows x 17 columns]" | |
] | |
} | |
], | |
"prompt_number": 17 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This `DataFrame` can easily be written to a CSV using the `to_csv` method. We don't need to yet - there's still the analysis to do!\n", | |
"\n", | |
"We'll need to define a new class that represents a section's time. Let's call it `SectionTime`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"class SectionTime(object):\n", | |
" dt_cache = {}\n", | |
" \n", | |
" def __init__(self, time_string, day_string, format_string, seperator='-'):\n", | |
" # Time\n", | |
" start_string, end_string = time_string.split(seperator)\n", | |
" dt_cache = SectionTime.dt_cache\n", | |
" \n", | |
" self.start = dt_cache.get(start_string, None)\n", | |
" \n", | |
" if not self.start:\n", | |
" self.start = datetime.strptime(start_string, format_string)\n", | |
" dt_cache[start_string] = self.start\n", | |
" \n", | |
" self.end = dt_cache.get(end_string, None)\n", | |
" \n", | |
" if not self.end:\n", | |
" self.end = datetime.strptime(end_string, format_string)\n", | |
" dt_cache[end_string] = self.end\n", | |
" \n", | |
" self.start_string = self.start.strftime('%H:%M')\n", | |
" self.end_string = self.end.strftime('%H:%M')\n", | |
" \n", | |
" # Days\n", | |
" self.days = list(day_string)\n", | |
" \n", | |
" def is_conflict(self, other):\n", | |
" '''Check for a conflict with another SectionTime object.'''\n", | |
" if any([day in self.days for day in other.days]):\n", | |
" if other.start >= self.start and other.end <= self.end:\n", | |
" return True\n", | |
" elif self.start >= other.start and self.end <= other.end:\n", | |
" return True\n", | |
" else:\n", | |
" return False\n", | |
" \n", | |
" def __str__(self):\n", | |
" return '%s: %s-%s' % (', '.join(self.days), self.start_string, self.end_string)\n", | |
"\n", | |
"format_string = '%I:%M %p'\n", | |
"tr1 = SectionTime(df.time[831], 'MW', format_string)\n", | |
"tr2 = SectionTime(df.time[831], 'TU', format_string)\n", | |
"\n", | |
"tr1.is_conflict(tr2) # False\n", | |
"tr1.is_conflict(tr1) # True" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 18, | |
"text": [ | |
"True" | |
] | |
} | |
], | |
"prompt_number": 18 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we merge the `time` and `days` columns into a list of tuples, and then create a `SectionTime` for each pair." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"pairs = list(zip(df.time, df.days))\n", | |
"sectiontimes = []\n", | |
"\n", | |
"for pair in pairs:\n", | |
" sectiontimes.append(SectionTime(pair[0], pair[1], format_string))\n", | |
" \n", | |
"len(sectiontimes)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 41, | |
"text": [ | |
"832" | |
] | |
} | |
], | |
"prompt_number": 41 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Next, we'll take each `SectionTime`, compare it to the rest, and then store the stats in a dictionary." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"results = {}\n", | |
"\n", | |
"for time in sectiontimes:\n", | |
" conflict_count = 0\n", | |
" \n", | |
" for each in sectiontimes:\n", | |
" if each != time:\n", | |
" if time.is_conflict(each):\n", | |
" conflict_count += 1\n", | |
" \n", | |
" results[str(time)] = conflict_count" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 42 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Finally, we convert our results into a `DataFrame`, sort by number of conflicts, and then output the 30 times with the least conflicts." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"results_df = pd.DataFrame(data=results.items(), columns=['time', 'conflicts'])\n", | |
"results_df.sort(columns=['conflicts'])[:30]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>time</th>\n", | |
" <th>conflicts</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>76 </th>\n", | |
" <td> S: 13:00-15:50</td>\n", | |
" <td> 0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>133</th>\n", | |
" <td> S: 09:00-11:50</td>\n", | |
" <td> 0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>153</th>\n", | |
" <td> F: 09:00-11:50</td>\n", | |
" <td> 0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>170</th>\n", | |
" <td> F: 13:00-15:50</td>\n", | |
" <td> 0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>203</th>\n", | |
" <td> R: 17:00-20:00</td>\n", | |
" <td> 1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>82 </th>\n", | |
" <td> U: 17:30-19:30</td>\n", | |
" <td> 1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>45 </th>\n", | |
" <td> U, T: 18:30-19:20</td>\n", | |
" <td> 1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>28 </th>\n", | |
" <td> R: 16:00-18:50</td>\n", | |
" <td> 2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>129</th>\n", | |
" <td> M, W: 18:30-19:45</td>\n", | |
" <td> 4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>73 </th>\n", | |
" <td> M, W: 17:30-18:50</td>\n", | |
" <td> 8</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>44 </th>\n", | |
" <td> W: 17:00-19:50</td>\n", | |
" <td> 9</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25 </th>\n", | |
" <td> U, T, R: 17:00-18:50</td>\n", | |
" <td> 9</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>64 </th>\n", | |
" <td> R: 11:00-11:50</td>\n", | |
" <td> 12</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>155</th>\n", | |
" <td> R: 10:00-10:50</td>\n", | |
" <td> 12</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>142</th>\n", | |
" <td> R: 10:00-11:50</td>\n", | |
" <td> 13</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>102</th>\n", | |
" <td> U, T: 17:00-18:15</td>\n", | |
" <td> 14</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>158</th>\n", | |
" <td> W: 16:00-17:50</td>\n", | |
" <td> 14</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>125</th>\n", | |
" <td> M, W: 17:00-18:50</td>\n", | |
" <td> 15</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>88 </th>\n", | |
" <td> U, T: 11:00-12:50</td>\n", | |
" <td> 17</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>178</th>\n", | |
" <td> U: 16:00-18:45</td>\n", | |
" <td> 18</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11 </th>\n", | |
" <td> W: 15:00-17:00</td>\n", | |
" <td> 18</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>96 </th>\n", | |
" <td> M: 17:00-18:00</td>\n", | |
" <td> 18</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>160</th>\n", | |
" <td> U: 16:00-17:50</td>\n", | |
" <td> 19</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6 </th>\n", | |
" <td> W: 16:00-20:00</td>\n", | |
" <td> 20</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10 </th>\n", | |
" <td> M, W: 17:00-18:15</td>\n", | |
" <td> 20</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14 </th>\n", | |
" <td> T: 16:00-18:50</td>\n", | |
" <td> 23</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>85 </th>\n", | |
" <td> M: 16:00-18:50</td>\n", | |
" <td> 25</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>195</th>\n", | |
" <td> M: 16:00-17:50</td>\n", | |
" <td> 25</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>174</th>\n", | |
" <td> M: 16:00-20:00</td>\n", | |
" <td> 26</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>191</th>\n", | |
" <td> U: 15:30-17:20</td>\n", | |
" <td> 26</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>30 rows \u00d7 2 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 45, | |
"text": [ | |
" time conflicts\n", | |
"76 S: 13:00-15:50 0\n", | |
"133 S: 09:00-11:50 0\n", | |
"153 F: 09:00-11:50 0\n", | |
"170 F: 13:00-15:50 0\n", | |
"203 R: 17:00-20:00 1\n", | |
"82 U: 17:30-19:30 1\n", | |
"45 U, T: 18:30-19:20 1\n", | |
"28 R: 16:00-18:50 2\n", | |
"129 M, W: 18:30-19:45 4\n", | |
"73 M, W: 17:30-18:50 8\n", | |
"44 W: 17:00-19:50 9\n", | |
"25 U, T, R: 17:00-18:50 9\n", | |
"64 R: 11:00-11:50 12\n", | |
"155 R: 10:00-10:50 12\n", | |
"142 R: 10:00-11:50 13\n", | |
"102 U, T: 17:00-18:15 14\n", | |
"158 W: 16:00-17:50 14\n", | |
"125 M, W: 17:00-18:50 15\n", | |
"88 U, T: 11:00-12:50 17\n", | |
"178 U: 16:00-18:45 18\n", | |
"11 W: 15:00-17:00 18\n", | |
"96 M: 17:00-18:00 18\n", | |
"160 U: 16:00-17:50 19\n", | |
"6 W: 16:00-20:00 20\n", | |
"10 M, W: 17:00-18:15 20\n", | |
"14 T: 16:00-18:50 23\n", | |
"85 M: 16:00-18:50 25\n", | |
"195 M: 16:00-17:50 25\n", | |
"174 M: 16:00-20:00 26\n", | |
"191 U: 15:30-17:20 26\n", | |
"\n", | |
"[30 rows x 2 columns]" | |
] | |
} | |
], | |
"prompt_number": 45 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The slots 11:00-12:50 (UT), 16:00-18:45 (U), and 15:30-17:20 (U) seem appropriate. I'll leave it up to the attendees to decide between these time slots." | |
] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment