Created
January 29, 2014 23:16
-
-
Save BrianHicks/8699332 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "Bike Racks" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "# Bike Racks\n\nWe're going to try and find cluster of bike racks in Cincinnatti. Why? Because John asked for something cool.\n\nFirst we need to grab the data. Fortunately Socrata has this!" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "import requests\nimport csv\n\nracks = requests.get('https://cincinnati.demo.socrata.com/api/views/wi79-n3c6/rows.json?accessType=DOWNLOAD').json()", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "The data is in a bit of weird format but at least it's deserializable. The dictionary comprehension below matches up location name with coordinates." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "racks = {rack[9]: (float(rack[10][1]), float(rack[10][2])) for rack in racks['data']}\nprint racks.items()[0]", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "(u'Cincinnati Commerce Center', (39.102596310765705, -84.51330853175256))\n" | |
} | |
], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "Next we'll need to define a grouping function. I tend to abuse `collections.Counter` for this sort of thing, rather than iterating over lists. This just chops off some digits of precision off of both components of the coordinate and returns a count of those groups." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "import math\nfrom collections import Counter\n\ndef get_grouping_at(values, precision):\n return Counter(\n (round(lat, precision), round(lng, precision))\n for lat, lng\n in values\n )", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "After we have our grouping function, we see what the grouping is like for a bunch of values. As you can see, the groups stay the same after 4 digits of precision, so we'll go for 2 to get sort of medium-big areas without being huge. Of course since this is geo data this scale is logarithmic." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "import math\n\nfor digits in range(0, 10):\n groups = get_grouping_at(racks.values(), digits)\n print \"{}:\\t{} groups\".format(digits, len(groups))", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "0:\t2 groups\n1:\t7 groups\n2:\t68 groups\n3:\t165 groups\n4:\t198 groups\n5:\t200 groups\n6:\t200 groups\n7:\t200 groups\n8:\t200 groups\n9:\t200 groups\n" | |
} | |
], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "And last we're going to list each group and the number of bike racks it contains." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "precision = 2\ngroups = get_grouping_at(racks.values(), precision)\nfor coords, count in groups.most_common(10):\n print \"The area roughly centered at {coords[0]:0<5}, {coords[1]:0<6} has {count} bike racks.\".format(\n coords=coords,\n count=count\n )\n\nprint \"Overall, there are {} groups\".format(len(groups))", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "The area roughly centered at 39.11, -84.50 has 52 bike racks.\nThe area roughly centered at 39.11, -84.51 has 43 bike racks.\nThe area roughly centered at 39.10, -84.51 has 25 bike racks.\nThe area roughly centered at 39.10, -84.52 has 10 bike racks.\nThe area roughly centered at 39.11, -84.52 has 8 bike racks.\nThe area roughly centered at 39.13, -84.52 has 7 bike racks.\nThe area roughly centered at 39.14, -84.51 has 6 bike racks.\nThe area roughly centered at 39.16, -84.54 has 6 bike racks.\nThe area roughly centered at 39.13, -84.51 has 5 bike racks.\nThe area roughly centered at 39.15, -84.43 has 5 bike racks.\nOverall, there are 68 groups\n" | |
} | |
], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "That's about it. Of course, we could have used numpy/scipy/matplotlib to do something visual like heatmaps, but this is a quick and dirty way of getting at the data." | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment