-
-
Save domitry/e087d69315075bebe3b1 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"language": "ruby", | |
"name": "", | |
"signature": "sha256:2fbb64dd42fafb2068217707704845a0bf7dd3341038ac4e28b0e4e7ab82fb48" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Finding shape consensus among multiple geo polygons\n", | |
"\n", | |
"One of the tasks in the [Building Inspector](http://buildinginspector.nypl.org/) is [fixing building footprints](http://buildinginspector.nypl.org/fix). The user is presented a map with an overlaid shape (red dots). The purpose is to draw the correct shape (or shapes, since the red overlay may cover multiple building footprints).\n", | |
"\n", | |
"Multiple people receive the same map and overlay. This notebook describes a process to find the resulting consensus (or mean) shape.\n", | |
"\n", | |
"Below is an example showing the map, the original polygon shown to each user (red dots) and the resulting polygons drawn (yellow)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"IRuby.html '<iframe src=\"http://jsfiddle.net/mgiraldo/pdkCb/3/embedded/result/\" width=500 height=400></iframe>'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<iframe src=\"http://jsfiddle.net/mgiraldo/pdkCb/2/embedded/result/\" width=500 height=400></iframe>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 1, | |
"text": [ | |
"\"<iframe src=\\\"http://jsfiddle.net/mgiraldo/pdkCb/2/embedded/result/\\\" width=500 height=400></iframe>\"" | |
] | |
} | |
], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"It is hard to see but there are 11 yellow polygons: one rectangle in the lower left part, one for the upper right part (both wrong), and 9 for the complete L-shaped building.\n", | |
"\n", | |
"# Requirements\n", | |
"\n", | |
"The process to find the geometry that best summarizes what is drawn by users has to take into account:\n", | |
"\n", | |
"1. an overlay may span _multiple_ polygons (red dots covering more than one building)\n", | |
"1. polygons may have any number of vertices greater or equal to three\n", | |
"1. users will not always draw the polygons the same way (eg: use more or fewer points)\n", | |
"\n", | |
"The process described in this notebook makes use of the [DBSCAN clustering algorithm](https://en.wikipedia.org/wiki/DBSCAN) to find an unknown amount of dense regions of points and determine the resulting geometries from there. The _input_ to this process will be a GeoJSON FeatureCollection containing all the polygons drawn by contributors that are associated to a given red overlay. the expected _output_ is a list of geo point arrays with the summary shapes determined by the algorithm.\n", | |
"\n", | |
"**All the necessary code is included** and should be executable by any machine that has the required Ruby gems installed. _This code was tested on Ruby 2.1.0._\n", | |
"\n", | |
"# Process\n", | |
"\n", | |
"First, we need the [RGeo](https://github.com/rgeo/rgeo) package along with its [GeoJSON component](https://github.com/rgeo/rgeo-geojson):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": true, | |
"input": [ | |
"require 'rgeo'\n", | |
"require 'rgeo-geojson'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 2, | |
"text": [ | |
"true" | |
] | |
} | |
], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We will use a [Ruby implementation](https://github.com/matiasinsaurralde/dbscan) of the [DBSCAN clustering algorithm](https://en.wikipedia.org/wiki/DBSCAN)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"require 'dbscan'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 3, | |
"text": [ | |
"true" | |
] | |
} | |
], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"For visualization convenience in this notebook we will also use the awesome [Nyaplot](https://github.com/domitry/nyaplot), a D3-powered visualization library. I had to manually build it according to [the instructions](https://github.com/domitry/nyaplot#installation) since it is not yet in RubyGems.org." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"require 'nyaplot'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 4, | |
"text": [ | |
"true" | |
] | |
} | |
], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Initialize Nyaplot to work in this IRuby Notebook:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"Nyaplot.init_iruby" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<script>\n", | |
"if(window['d3'] === undefined ||\n", | |
" window['Nyaplot'] === undefined){\n", | |
" var path = {\"d3\":\"http://d3js.org/d3.v3.min\"};\n", | |
"\n", | |
"\n", | |
"\n", | |
" var shim = {\"d3\":{\"exports\":\"d3\"}};\n", | |
"\n", | |
" require.config({paths: path, shim:shim});\n", | |
"\n", | |
"\n", | |
"require(['d3'], function(d3){window['d3']=d3;console.log('finished loading d3');\n", | |
"\n", | |
"\tvar script = d3.select(\"head\")\n", | |
"\t .append(\"script\")\n", | |
"\t .attr(\"src\", \"https://rawgit.com/domitry/Nyaplotjs/master/release/nyaplot.js\")\n", | |
"\t .attr(\"async\", true);\n", | |
"\n", | |
"\tscript[0][0].onload = script[0][0].onreadystatechange = function(){\n", | |
"\t var event = document.createEvent(\"HTMLEvents\");\n", | |
"\t event.initEvent(\"load_nyaplot\",false,false);\n", | |
"\t window.dispatchEvent(event);\n", | |
"\t console.log('Finished loading Nyaplotjs');\n", | |
"\t};\n", | |
"\n", | |
"\n", | |
"});\n", | |
"}\n", | |
"</script>\n" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 5, | |
"text": [ | |
"\"<script>\\nif(window['d3'] === undefined ||\\n window['Nyaplot'] === undefined){\\n var path = {\\\"d3\\\":\\\"http://d3js.org/d3.v3.min\\\"};\\n\\n\\n\\n var shim = {\\\"d3\\\":{\\\"exports\\\":\\\"d3\\\"}};\\n\\n require.config({paths: path, shim:shim});\\n\\n\\nrequire(['d3'], function(d3){window['d3']=d3;console.log('finished loading d3');\\n\\n\\tvar script = d3.select(\\\"head\\\")\\n\\t .append(\\\"script\\\")\\n\\t .attr(\\\"src\\\", \\\"https://rawgit.com/domitry/Nyaplotjs/master/release/nyaplot.js\\\")\\n\\t .attr(\\\"async\\\", true);\\n\\n\\tscript[0][0].onload = script[0][0].onreadystatechange = function(){\\n\\t var event = document.createEvent(\\\"HTMLEvents\\\");\\n\\t event.initEvent(\\\"load_nyaplot\\\",false,false);\\n\\t window.dispatchEvent(event);\\n\\t console.log('Finished loading Nyaplotjs');\\n\\t};\\n\\n\\n});\\n}\\n</script>\\n\"" | |
] | |
} | |
], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This is the GeoJSON that describes the shapes that have been drawn by the different contributors:\n", | |
"\n", | |
"_Note: this GeoJSON will not validate in [GeoJSONLint](http://geojsonlint.com/) because first and last points do not match_" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"geomstr = '{\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"properties\":{\"user_id\":638},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98620970547199,40.7356342514617],[-73.98627072572708,40.735547874977094],[-73.98632504045963,40.73557226364293],[-73.98622445762157,40.73570995781772],[-73.9861835539341,40.73569268254945],[-73.98621775209902,40.735640856717666]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":666},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98620769381522,40.73563526765495],[-73.9862660318613,40.735547874977094],[-73.98632504045963,40.735570739351566],[-73.98622579872608,40.73570944972167],[-73.98618154227734,40.73569217445325],[-73.98621775209902,40.73563933242788]]]}},{\"type\":\"Feature\",\"properties\":{\"session_id\":\"79e7ee062a9e0333926e3e1fdc3e92db\"},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98632369935513,40.735570739351566],[-73.98622512817383,40.73570944972167],[-73.98618154227734,40.73569014206842],[-73.98621909320354,40.735640856717666],[-73.98620970547199,40.73563526765495],[-73.98627005517483,40.73554889117169]]]}},{\"type\":\"Feature\",\"properties\":{\"session_id\":\"3d3003b26bb6b2f3b9577924b9ed5e0e\"},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98621842265129,40.7356423810074],[-73.98620903491974,40.73563577575159],[-73.98627139627934,40.735547874977094],[-73.98632436990738,40.735571755545806],[-73.98622579872608,40.73570995781772],[-73.98618087172508,40.735689633972214]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":596},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98626938462257,40.73554889117167],[-73.98632369935513,40.735572771740024],[-73.98622445762157,40.73570894162559],[-73.98618154227734,40.73569065016463],[-73.98621775209902,40.735640856717666],[-73.98620836436749,40.735634251461676]]]}},{\"type\":\"Feature\",\"properties\":{\"session_id\":\"0afaf74383ce51aceba02fc49ce5a9e3\"},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98621775209902,40.73563984052446],[-73.98620836436749,40.73563272717173],[-73.98626938462257,40.735550415463514],[-73.98632235825062,40.73557124744871],[-73.98622360456956,40.73570641325812],[-73.98618768252459,40.73568957578454]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":538},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98632571101189,40.735571755545806],[-73.98622378706932,40.73570995781772],[-73.98618288338184,40.73569268254945],[-73.98621775209902,40.73564034862108],[-73.9862110465765,40.7356362838482],[-73.98627005517483,40.735550923560815]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":580},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98632436990738,40.73557124744871],[-73.98626066744328,40.7356581319994],[-73.98625999689102,40.7356581319994],[-73.98620903491974,40.735634759558316],[-73.98626804351805,40.735547874977094]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":580},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98626133799553,40.7356581319994],[-73.98622579872608,40.73570944972167],[-73.98618154227734,40.73569166635704],[-73.98621842265129,40.73563984052446]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":548},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98620970547199,40.73563475955834],[-73.98627005517483,40.73554990736624],[-73.98632369935513,40.735571755545806],[-73.98622360456956,40.73570641325812],[-73.9861848950386,40.735689633972214],[-73.98621842265129,40.735640856717666]]]}},{\"type\":\"Feature\",\"properties\":{\"session_id\":\"53056025663f6d6564a39975971cb87c\"},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98621909320354,40.735638316234656],[-73.98620836436749,40.7356362838482],[-73.98620769381522,40.73563577575159],[-73.98627005517483,40.73554939926897],[-73.98632302880287,40.73557023125444],[-73.98622360456956,40.73570641325812],[-73.98617953062057,40.735689633972214]]]}}]}'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 6, | |
"text": [ | |
"\"{\\\"type\\\":\\\"FeatureCollection\\\",\\\"features\\\":[{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":638},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98620970547199,40.7356342514617],[-73.98627072572708,40.735547874977094],[-73.98632504045963,40.73557226364293],[-73.98622445762157,40.73570995781772],[-73.9861835539341,40.73569268254945],[-73.98621775209902,40.735640856717666]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":666},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98620769381522,40.73563526765495],[-73.9862660318613,40.735547874977094],[-73.98632504045963,40.735570739351566],[-73.98622579872608,40.73570944972167],[-73.98618154227734,40.73569217445325],[-73.98621775209902,40.73563933242788]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"session_id\\\":\\\"79e7ee062a9e0333926e3e1fdc3e92db\\\"},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98632369935513,40.735570739351566],[-73.98622512817383,40.73570944972167],[-73.98618154227734,40.73569014206842],[-73.98621909320354,40.735640856717666],[-73.98620970547199,40.73563526765495],[-73.98627005517483,40.73554889117169]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"session_id\\\":\\\"3d3003b26bb6b2f3b9577924b9ed5e0e\\\"},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98621842265129,40.7356423810074],[-73.98620903491974,40.73563577575159],[-73.98627139627934,40.735547874977094],[-73.98632436990738,40.735571755545806],[-73.98622579872608,40.73570995781772],[-73.98618087172508,40.735689633972214]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":596},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98626938462257,40.73554889117167],[-73.98632369935513,40.735572771740024],[-73.98622445762157,40.73570894162559],[-73.98618154227734,40.73569065016463],[-73.98621775209902,40.735640856717666],[-73.98620836436749,40.735634251461676]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"session_id\\\":\\\"0afaf74383ce51aceba02fc49ce5a9e3\\\"},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98621775209902,40.73563984052446],[-73.98620836436749,40.73563272717173],[-73.98626938462257,40.735550415463514],[-73.98632235825062,40.73557124744871],[-73.98622360456956,40.73570641325812],[-73.98618768252459,40.73568957578454]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":538},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98632571101189,40.735571755545806],[-73.98622378706932,40.73570995781772],[-73.98618288338184,40.73569268254945],[-73.98621775209902,40.73564034862108],[-73.9862110465765,40.7356362838482],[-73.98627005517483,40.735550923560815]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":580},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98632436990738,40.73557124744871],[-73.98626066744328,40.7356581319994],[-73.98625999689102,40.7356581319994],[-73.98620903491974,40.735634759558316],[-73.98626804351805,40.735547874977094]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":580},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98626133799553,40.7356581319994],[-73.98622579872608,40.73570944972167],[-73.98618154227734,40.73569166635704],[-73.98621842265129,40.73563984052446]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":548},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98620970547199,40.73563475955834],[-73.98627005517483,40.73554990736624],[-73.98632369935513,40.735571755545806],[-73.98622360456956,40.73570641325812],[-73.9861848950386,40.735689633972214],[-73.98621842265129,40.735640856717666]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"session_id\\\":\\\"53056025663f6d6564a39975971cb87c\\\"},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98621909320354,40.735638316234656],[-73.98620836436749,40.7356362838482],[-73.98620769381522,40.73563577575159],[-73.98627005517483,40.73554939926897],[-73.98632302880287,40.73557023125444],[-73.98622360456956,40.73570641325812],[-73.98617953062057,40.735689633972214]]]}}]}\"" | |
] | |
} | |
], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We decode the GeoJSON into a `RGeo::GeoJSON` structure (see the [RGeo::GeoJSON docs](http://rdoc.info/github/rgeo/rgeo-geojson/frames)):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"geocollection = RGeo::GeoJSON.decode(geomstr, :json_parser => :json)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 7, | |
"text": [ | |
"#<RGeo::GeoJSON::FeatureCollection:0x8218e4f8>" | |
] | |
} | |
], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We wrap this in a function for convenience:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def parse(json)\n", | |
" RGeo::GeoJSON.decode(json, :json_parser => :json)\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 8, | |
"text": [ | |
":parse" | |
] | |
} | |
], | |
"prompt_number": 8 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This structure is now a group of [features](http://rdoc.info/github/rgeo/rgeo-geojson/RGeo/GeoJSON/Feature), each with an [RGeo::Geos::CAPIPolygonImpl](http://rdoc.info/github/rgeo/rgeo/RGeo/Geos/CAPIPolygonImpl) geometry describing each polygon, among other properties (see the [RGeo docs](http://rdoc.info/github/rgeo/rgeo/frames)):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"geocollection.first.geometry" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 9, | |
"text": [ | |
"#<RGeo::Geos::CAPIPolygonImpl:0x82193f48 \"POLYGON ((-73.98620970547199 40.7356342514617, -73.98627072572708 40.735547874977094, -73.98632504045963 40.73557226364293, -73.98622445762157 40.73570995781772, -73.9861835539341 40.73569268254945, -73.98621775209902 40.735640856717666, -73.98620970547199 40.7356342514617))\">" | |
] | |
} | |
], | |
"prompt_number": 9 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Algorithm\n", | |
"\n", | |
"The main logic behind this process is as follows:\n", | |
"\n", | |
"1. cluster all the polygons by their centroids (similar-shaped polygons should have similar centroids<sup>[1]</sup>, clustering will let us identify outliers)\n", | |
"1. only use clusters that have three or more centroids (three or more people drew similar-shaped polygons)\n", | |
"1. for each cluster:\n", | |
" 1. cluster the vertices of its polygons\n", | |
" 1. find the mean vertex describing each cluster\n", | |
" 1. connect those mean vertices in the most likely order\n", | |
" 1. verify that the connected polygon makes sense (will explain better below)\n", | |
"\n", | |
"[1] _different polygons might also have similar centroids but we're skipping this for now :)_\n", | |
"\n", | |
"Since DBSCAN works with number arrays, we need to convert the complex RGeo structures. Below a simple centroid-extraction function:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def get_centroid(poly_feature)\n", | |
" return if (poly_feature.geometry.geometry_type.type_name != \"Polygon\")\n", | |
" c = poly_feature.geometry.centroid\n", | |
" return [c.x, c.y]\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 10, | |
"text": [ | |
":get_centroid" | |
] | |
} | |
], | |
"prompt_number": 10 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's test it with the first polygon in the collection:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"centroid = get_centroid(geocollection.first)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 11, | |
"text": [ | |
"[-73.98625268168838, 40.73562601945317]" | |
] | |
} | |
], | |
"prompt_number": 11 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we need a convenience function to get all the centroids of the collection. We will make it a hash because we later need to be able to go back to this list to extract its corresponding set of polygons and a hash was the way I found most convenient:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def get_all_centroids(geom)\n", | |
" centroids = {}\n", | |
" geom.each_with_index do |poly,index|\n", | |
" next if (poly.geometry.geometry_type.type_name != \"Polygon\")\n", | |
" centroids[index] = get_centroid(poly)\n", | |
" end\n", | |
" return centroids\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 12, | |
"text": [ | |
":get_all_centroids" | |
] | |
} | |
], | |
"prompt_number": 12 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Test again:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"centroids = get_all_centroids(geocollection)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 13, | |
"text": [ | |
"{0=>[-73.98625268168838, 40.73562601945317], 1=>[-73.98625173238652, 40.735625569382876], 2=>[-73.9862518966646, 40.73562642272427], 3=>[-73.986252242017, 40.735626656082445], 4=>[-73.98625152460835, 40.735626229414], 5=>[-73.98625207318744, 40.73562418649854], 6=>[-73.98625258509149, 40.7356272053874], 7=>[-73.98626592099406, 40.735602617283476], 8=>[-73.9862216645921, 40.73567482334759], 9=>[-73.98625254867669, 40.735624721075084], 10=>[-73.98625077341322, 40.73562552211442]}" | |
] | |
} | |
], | |
"prompt_number": 13 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"A simple plot of all the centroids using Nyaplot:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"plot = Nyaplot::Plot.new\n", | |
"plot.width(400)\n", | |
"plot.height(400)\n", | |
"plot.zoom(true)\n", | |
"points_x = centroids.map { |p| p[1][0] }\n", | |
"points_y = centroids.map { |p| p[1][1] }\n", | |
"df = Nyaplot::DataFrame.new({x:points_x,y:points_y})\n", | |
"# add some padding\n", | |
"xmin = points_x.min - 1e-5\n", | |
"xmax = points_x.max + 1e-5\n", | |
"ymin = points_y.min - 1e-5\n", | |
"ymax = points_y.max + 1e-5\n", | |
"plot.xrange([xmin,xmax])\n", | |
"plot.yrange([ymin,ymax])\n", | |
"# end padding\n", | |
"sc = plot.add_with_df(df, :scatter, :x, :y)\n", | |
"plot.show" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div id='vis-e0d8c999-6b40-4a41-a8ef-e3e777e213e7'></div>\n", | |
"<script>\n", | |
"(function(){\n", | |
" var render = function(){\n", | |
" var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\"},\"data\":\"91387de3-8aed-4aa0-9fba-1faedd43e2a8\"}],\"options\":{\"width\":400,\"height\":400,\"zoom\":true,\"xrange\":[-73.98627592099406,-73.98621166459209],\"yrange\":[40.73559261728347,40.73568482334759]}}],\"data\":{\"91387de3-8aed-4aa0-9fba-1faedd43e2a8\":[{\"x\":-73.98625268168838,\"y\":40.73562601945317},{\"x\":-73.98625173238652,\"y\":40.735625569382876},{\"x\":-73.9862518966646,\"y\":40.73562642272427},{\"x\":-73.986252242017,\"y\":40.735626656082445},{\"x\":-73.98625152460835,\"y\":40.735626229414},{\"x\":-73.98625207318744,\"y\":40.73562418649854},{\"x\":-73.98625258509149,\"y\":40.7356272053874},{\"x\":-73.98626592099406,\"y\":40.735602617283476},{\"x\":-73.9862216645921,\"y\":40.73567482334759},{\"x\":-73.98625254867669,\"y\":40.735624721075084},{\"x\":-73.98625077341322,\"y\":40.73562552211442}]},\"extension\":[]}\n", | |
" Nyaplot.core.parse(model, '#vis-e0d8c999-6b40-4a41-a8ef-e3e777e213e7');\n", | |
" };\n", | |
" if(window['Nyaplot']==undefined){\n", | |
" window.addEventListener('load_nyaplot', render, false);\n", | |
"\treturn;\n", | |
" }\n", | |
" render();\n", | |
"})();\n", | |
"</script>\n" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 14, | |
"text": [ | |
"\"<div id='vis-e0d8c999-6b40-4a41-a8ef-e3e777e213e7'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\"},\\\"data\\\":\\\"91387de3-8aed-4aa0-9fba-1faedd43e2a8\\\"}],\\\"options\\\":{\\\"width\\\":400,\\\"height\\\":400,\\\"zoom\\\":true,\\\"xrange\\\":[-73.98627592099406,-73.98621166459209],\\\"yrange\\\":[40.73559261728347,40.73568482334759]}}],\\\"data\\\":{\\\"91387de3-8aed-4aa0-9fba-1faedd43e2a8\\\":[{\\\"x\\\":-73.98625268168838,\\\"y\\\":40.73562601945317},{\\\"x\\\":-73.98625173238652,\\\"y\\\":40.735625569382876},{\\\"x\\\":-73.9862518966646,\\\"y\\\":40.73562642272427},{\\\"x\\\":-73.986252242017,\\\"y\\\":40.735626656082445},{\\\"x\\\":-73.98625152460835,\\\"y\\\":40.735626229414},{\\\"x\\\":-73.98625207318744,\\\"y\\\":40.73562418649854},{\\\"x\\\":-73.98625258509149,\\\"y\\\":40.7356272053874},{\\\"x\\\":-73.98626592099406,\\\"y\\\":40.735602617283476},{\\\"x\\\":-73.9862216645921,\\\"y\\\":40.73567482334759},{\\\"x\\\":-73.98625254867669,\\\"y\\\":40.735624721075084},{\\\"x\\\":-73.98625077341322,\\\"y\\\":40.73562552211442}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-e0d8c999-6b40-4a41-a8ef-e3e777e213e7');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
] | |
} | |
], | |
"prompt_number": 14 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 1. Clustering centroids\n", | |
"\n", | |
"We can see here how the centroids reflect the three different basic shapes drawn by contributors above: the lone centroids for the upper-right and lower-left rectangles and the group of nine centroids for the L-shaped polygons in the \"center\".\n", | |
"\n", | |
"The problem now is finding a good minimum distance between centroids:\n", | |
"\n", | |
"- **big** enough to cover nearby centroids but also\n", | |
"- **small** enough to _not_ group polygons that don't belong with each other\n", | |
"\n", | |
"Let's create a table to see just how close/far these centroids are from each other (standard euclidean distance: $\\sqrt{((\\Delta x)^2+(\\Delta y)^2)}$). Notice that, since geographic metric units have a _lot_ of significant digits (numbers to the right of the decimal point), we are dealing with distances smaller than $10^{-6}$: " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": true, | |
"input": [ | |
"dists = []\n", | |
"done = {}\n", | |
"centroids.each_with_index do |cc1,i|\n", | |
" centroids.each_with_index do |cc2,j|\n", | |
" c1 = cc1[1]\n", | |
" c2 = cc2[1]\n", | |
" dists.push({:dist=>Math.hypot(c1[0]-c2[0],c1[1]-c2[1]),:from=>i,:to=>j,:from_centroid=>c1,:to_centroid=>c2}) if (c1 != c2 && !done[[c2,c1]]) \n", | |
" done[[c1,c2]] = true\n", | |
" end\n", | |
"end\n", | |
"dists = dists.sort_by!{|k| k[:dist]}\n", | |
"dist_df = Nyaplot::DataFrame.new(dists)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<table><tr><th>dist</th><th>from</th><th>to</th><th>from_centroid</th><th>to_centroid</th></tr><tr><td>4.1680249477628687e-07</td><td>2</td><td>3</td><td>[-73.9862518966646, 40.73562642272427]</td><td>[-73.986252242017, 40.735626656082445]</td></tr><tr><td>4.1927880127312373e-07</td><td>2</td><td>4</td><td>[-73.9862518966646, 40.73562642272427]</td><td>[-73.98625152460835, 40.735626229414]</td></tr><tr><td>6.476388201422145e-07</td><td>3</td><td>6</td><td>[-73.986252242017, 40.735626656082445]</td><td>[-73.98625258509149, 40.7356272053874]</td></tr><tr><td>6.919630457708901e-07</td><td>1</td><td>4</td><td>[-73.98625173238652, 40.735625569382876]</td><td>[-73.98625152460835, 40.735626229414]</td></tr><tr><td>7.154453870992346e-07</td><td>5</td><td>9</td><td>[-73.98625207318744, 40.73562418649854]</td><td>[-73.98625254867669, 40.735624721075084]</td></tr><tr><td>7.736974578659688e-07</td><td>0</td><td>3</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.986252242017, 40.735626656082445]</td></tr><tr><td>8.346982305084655e-07</td><td>3</td><td>4</td><td>[-73.986252242017, 40.735626656082445]</td><td>[-73.98625152460835, 40.735626229414]</td></tr><tr><td>8.690102573992017e-07</td><td>1</td><td>2</td><td>[-73.98625173238652, 40.735625569382876]</td><td>[-73.9862518966646, 40.73562642272427]</td></tr><tr><td>8.825474076951457e-07</td><td>0</td><td>2</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.9862518966646, 40.73562642272427]</td></tr><tr><td>9.601375433120375e-07</td><td>1</td><td>10</td><td>[-73.98625173238652, 40.735625569382876]</td><td>[-73.98625077341322, 40.73562552211442]</td></tr><tr><td>1.0317784775226883e-06</td><td>4</td><td>10</td><td>[-73.98625152460835, 40.735626229414]</td><td>[-73.98625077341322, 40.73562552211442]</td></tr><tr><td>1.0423498270990819e-06</td><td>2</td><td>6</td><td>[-73.9862518966646, 40.73562642272427]</td><td>[-73.98625258509149, 40.7356272053874]</td></tr><tr><td>1.0505890250970463e-06</td><td>0</td><td>1</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.98625173238652, 40.735625569382876]</td></tr><tr><td>1.1759752372883875e-06</td><td>0</td><td>4</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.98625152460835, 40.735626229414]</td></tr><tr><td>1.1772662180684655e-06</td><td>1</td><td>9</td><td>[-73.98625173238652, 40.735625569382876]</td><td>[-73.98625254867669, 40.735624721075084]</td></tr><tr><td>1.1898617396243328e-06</td><td>0</td><td>6</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.98625258509149, 40.7356272053874]</td></tr><tr><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td></tr><tr><td>8.468969718424401e-05</td><td>7</td><td>8</td><td>[-73.98626592099406, 40.735602617283476]</td><td>[-73.9862216645921, 40.73567482334759]</td></tr></table>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 81, | |
"text": [ | |
"#<Nyaplot::DataFrame:0x0000010278d888 @name=\"53ef42c2-a4b6-4d3b-b363-51cbd8018f32\", @rows=[{\"dist\"=>4.1680249477628687e-07, \"from\"=>2, \"to\"=>3, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.986252242017, 40.735626656082445]}, {\"dist\"=>4.1927880127312373e-07, \"from\"=>2, \"to\"=>4, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625152460835, 40.735626229414]}, {\"dist\"=>6.476388201422145e-07, \"from\"=>3, \"to\"=>6, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>6.919630457708901e-07, \"from\"=>1, \"to\"=>4, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625152460835, 40.735626229414]}, {\"dist\"=>7.154453870992346e-07, \"from\"=>5, \"to\"=>9, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>7.736974578659688e-07, \"from\"=>0, \"to\"=>3, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.986252242017, 40.735626656082445]}, {\"dist\"=>8.346982305084655e-07, \"from\"=>3, \"to\"=>4, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625152460835, 40.735626229414]}, {\"dist\"=>8.690102573992017e-07, \"from\"=>1, \"to\"=>2, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.9862518966646, 40.73562642272427]}, {\"dist\"=>8.825474076951457e-07, \"from\"=>0, \"to\"=>2, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.9862518966646, 40.73562642272427]}, {\"dist\"=>9.601375433120375e-07, \"from\"=>1, \"to\"=>10, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.0317784775226883e-06, \"from\"=>4, \"to\"=>10, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.0423498270990819e-06, \"from\"=>2, \"to\"=>6, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>1.0505890250970463e-06, \"from\"=>0, \"to\"=>1, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625173238652, 40.735625569382876]}, {\"dist\"=>1.1759752372883875e-06, \"from\"=>0, \"to\"=>4, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625152460835, 40.735626229414]}, {\"dist\"=>1.1772662180684655e-06, \"from\"=>1, \"to\"=>9, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.1898617396243328e-06, \"from\"=>0, \"to\"=>6, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>1.2002662963773613e-06, \"from\"=>1, \"to\"=>3, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.986252242017, 40.735626656082445]}, {\"dist\"=>1.3051734635243496e-06, \"from\"=>0, \"to\"=>9, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.424259229759705e-06, \"from\"=>1, \"to\"=>5, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>1.4397193384948688e-06, \"from\"=>2, \"to\"=>10, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.441231617061079e-06, \"from\"=>4, \"to\"=>6, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>1.8222869506551436e-06, \"from\"=>2, \"to\"=>9, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.8231297971933801e-06, \"from\"=>4, \"to\"=>9, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.844889313547645e-06, \"from\"=>1, \"to\"=>6, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>1.8554461894036756e-06, \"from\"=>3, \"to\"=>10, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.8636745452851083e-06, \"from\"=>5, \"to\"=>10, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.9313197748456895e-06, \"from\"=>0, \"to\"=>5, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>1.947620190077808e-06, \"from\"=>9, \"to\"=>10, \"from_centroid\"=>[-73.98625254867669, 40.735624721075084], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.9591563623025234e-06, \"from\"=>3, \"to\"=>9, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.972019255979438e-06, \"from\"=>0, \"to\"=>10, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>2.115287830795299e-06, \"from\"=>4, \"to\"=>5, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>2.2431820796116284e-06, \"from\"=>2, \"to\"=>5, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>2.4729711093657763e-06, \"from\"=>6, \"to\"=>10, \"from_centroid\"=>[-73.98625258509149, 40.7356272053874], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>2.475348072708112e-06, \"from\"=>3, \"to\"=>5, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>2.4845791864857814e-06, \"from\"=>6, \"to\"=>9, \"from_centroid\"=>[-73.98625258509149, 40.7356272053874], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>3.0619823175285526e-06, \"from\"=>5, \"to\"=>6, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>2.563187052301914e-05, \"from\"=>5, \"to\"=>7, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.5834017791549575e-05, \"from\"=>7, \"to\"=>9, \"from_centroid\"=>[-73.98626592099406, 40.735602617283476], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>2.688755773883865e-05, \"from\"=>0, \"to\"=>7, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.698361448529006e-05, \"from\"=>1, \"to\"=>7, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.7460525955903987e-05, \"from\"=>7, \"to\"=>10, \"from_centroid\"=>[-73.98626592099406, 40.735602617283476], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>2.7629347229741968e-05, \"from\"=>2, \"to\"=>7, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.7654812048911204e-05, \"from\"=>4, \"to\"=>7, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.765824052887831e-05, \"from\"=>3, \"to\"=>7, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.797179207560266e-05, \"from\"=>6, \"to\"=>7, \"from_centroid\"=>[-73.98625258509149, 40.7356272053874], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>5.677629272142381e-05, \"from\"=>6, \"to\"=>8, \"from_centroid\"=>[-73.98625258509149, 40.7356272053874], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.7034997606567785e-05, \"from\"=>4, \"to\"=>8, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.7053171211386354e-05, \"from\"=>3, \"to\"=>8, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.706661497797782e-05, \"from\"=>2, \"to\"=>8, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.725325369972539e-05, \"from\"=>8, \"to\"=>10, \"from_centroid\"=>[-73.9862216645921, 40.73567482334759], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>5.7706371411325726e-05, \"from\"=>1, \"to\"=>8, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.782629481832627e-05, \"from\"=>0, \"to\"=>8, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.88563029015269e-05, \"from\"=>8, \"to\"=>9, \"from_centroid\"=>[-73.9862216645921, 40.73567482334759], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>5.906583744015411e-05, \"from\"=>5, \"to\"=>8, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>8.468969718424401e-05, \"from\"=>7, \"to\"=>8, \"from_centroid\"=>[-73.98626592099406, 40.735602617283476], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}]>" | |
] | |
} | |
], | |
"prompt_number": 81 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"From the table (which is sorted by closest points first) we can see that the top 9 results are under $10^{-7}$ units away from each other (0.000001).\n", | |
"\n", | |
"## The DBSCAN algorithm\n", | |
"\n", | |
"To understand how clusters are formed, it is useful to understand how the [DBSCAN clustering algorithm](https://en.wikipedia.org/wiki/DBSCAN#Algorithm) works:\n", | |
"\n", | |
"> DBSCAN requires two parameters: \u03b5 (eps) and the minimum number of points (min_points) required to form a dense region. It starts with an arbitrary starting point that has not been visited. This point's \u03b5-neighborhood is retrieved, and if it contains sufficiently many points, a cluster is started. Otherwise, the point is labeled as noise. Note that this point might later be found in a sufficiently sized \u03b5-environment of a different point and hence be made part of a cluster.\n", | |
"\n", | |
"> If a point is found to be a dense part of a cluster, its \u03b5-neighborhood is also part of that cluster. Hence, all points that are found within the \u03b5-neighborhood are added, as is their own \u03b5-neighborhood when they are also dense. This process continues until the density-connected cluster is completely found. Then, a new unvisited point is retrieved and processed, leading to the discovery of a further cluster or noise.\n", | |
"\n", | |
"By playing around with different sets of polygons I came to a general \u03b5 of $1.8(10^{-6})$ and a `min_points` of 2 for **centroid clusters** (polygon vertex clusters have different input values as we will see below).\n", | |
"\n", | |
"This is the resulting centroid-clustering function:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def cluster_centroids(centroids)\n", | |
" dbscan = DBSCAN( centroids.map{|c| c[1]}, :epsilon => 1.8e-06, :min_points => 2, :distance => :euclidean_distance )\n", | |
" return dbscan.results\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 73, | |
"text": [ | |
":cluster_centroids" | |
] | |
} | |
], | |
"prompt_number": 73 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's test it:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"centroid_clusters = cluster_centroids(centroids)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 74, | |
"text": [ | |
"{-1=>[[-73.98626592099406, 40.735602617283476], [-73.9862216645921, 40.73567482334759]], 0=>[[-73.98625268168838, 40.73562601945317], [-73.98625173238652, 40.735625569382876], [-73.9862518966646, 40.73562642272427], [-73.986252242017, 40.735626656082445], [-73.98625152460835, 40.735626229414], [-73.98625258509149, 40.7356272053874], [-73.98625254867669, 40.735624721075084], [-73.98625207318744, 40.73562418649854], [-73.98625077341322, 40.73562552211442]]}" | |
] | |
} | |
], | |
"prompt_number": 74 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The function returns a hash with whose `[-1]` key (if any) contains all the points that did not belong to a cluster and `[0..n]` contain the different clusters. In this example there is only one cluster, `centroid_clusters[0]` and the rejected `[-1]` non-cluster.\n", | |
"\n", | |
"Let's define a cluster plotting function and plot this (notice the \"disappearance\" of the two outliers that are being ignored by the function):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def plot_clusters(clusters)\n", | |
" plot = Nyaplot::Plot.new\n", | |
" plot.width(300)\n", | |
" plot.height(400)\n", | |
" plot.zoom(true)\n", | |
" pts = clusters.map{|c| c[1]}.flatten(1)\n", | |
" # add some padding\n", | |
" xmin = pts.map {|p| p[0]}.min - 1e-5\n", | |
" xmax = pts.map {|p| p[0]}.max + 1e-5\n", | |
" ymin = pts.map {|p| p[1]}.min - 1e-5\n", | |
" ymax = pts.map {|p| p[1]}.max + 1e-5\n", | |
" plot.xrange([xmin,xmax])\n", | |
" plot.yrange([ymin,ymax])\n", | |
" plot.rotate_x_label(-60)\n", | |
" plot.x_label(\"\")\n", | |
" plot.y_label(\"\")\n", | |
" # now plot\n", | |
" clusters.each do |cluster|\n", | |
" if cluster[0] != -1 # ignore cluster -1 because not enough points\n", | |
" cluster_x = cluster[1].map { |c| c[0] }\n", | |
" cluster_y = cluster[1].map { |c| c[1] }\n", | |
" names = cluster[1].map { |c| cluster[0] }\n", | |
" df = Nyaplot::DataFrame.new({x:cluster_x,y:cluster_y,cluster:names})\n", | |
" sc = plot.add_with_df(df, :scatter, :x, :y)\n", | |
" sc.tooltip_contents([:cluster])\n", | |
" color = \"#\"+ \"%06x\" % (rand * 0xffffff)\n", | |
" sc.color(color)\n", | |
" end\n", | |
" end\n", | |
" plot.show\n", | |
" return plot\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 18, | |
"text": [ | |
":plot_clusters" | |
] | |
} | |
], | |
"prompt_number": 18 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"plot = plot_clusters(centroid_clusters)\n", | |
"plot.show" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div id='vis-3e93ca2b-76c3-4c1e-ac75-7e5166ea7790'></div>\n", | |
"<script>\n", | |
"(function(){\n", | |
" var render = function(){\n", | |
" var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#cf19de\"},\"data\":\"e1eebed5-39fb-4ef3-9746-ce8403d7b6bb\"}],\"options\":{\"width\":300,\"height\":400,\"zoom\":true,\"xrange\":[-73.98627592099406,-73.98621166459209],\"yrange\":[40.73559261728347,40.73568482334759],\"rotate_x_label\":-60,\"x_label\":\"\",\"y_label\":\"\"}}],\"data\":{\"e1eebed5-39fb-4ef3-9746-ce8403d7b6bb\":[{\"x\":-73.98625268168838,\"y\":40.73562601945317,\"cluster\":0},{\"x\":-73.98625173238652,\"y\":40.735625569382876,\"cluster\":0},{\"x\":-73.9862518966646,\"y\":40.73562642272427,\"cluster\":0},{\"x\":-73.986252242017,\"y\":40.735626656082445,\"cluster\":0},{\"x\":-73.98625152460835,\"y\":40.735626229414,\"cluster\":0},{\"x\":-73.98625258509149,\"y\":40.7356272053874,\"cluster\":0},{\"x\":-73.98625254867669,\"y\":40.735624721075084,\"cluster\":0},{\"x\":-73.98625207318744,\"y\":40.73562418649854,\"cluster\":0},{\"x\":-73.98625077341322,\"y\":40.73562552211442,\"cluster\":0}]},\"extension\":[]}\n", | |
" Nyaplot.core.parse(model, '#vis-3e93ca2b-76c3-4c1e-ac75-7e5166ea7790');\n", | |
" };\n", | |
" if(window['Nyaplot']==undefined){\n", | |
" window.addEventListener('load_nyaplot', render, false);\n", | |
"\treturn;\n", | |
" }\n", | |
" render();\n", | |
"})();\n", | |
"</script>\n" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 75, | |
"text": [ | |
"\"<div id='vis-3e93ca2b-76c3-4c1e-ac75-7e5166ea7790'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#cf19de\\\"},\\\"data\\\":\\\"e1eebed5-39fb-4ef3-9746-ce8403d7b6bb\\\"}],\\\"options\\\":{\\\"rotate_x_label\\\":-60,\\\"width\\\":300,\\\"height\\\":400,\\\"zoom\\\":true,\\\"xrange\\\":[-73.98627592099406,-73.98621166459209],\\\"yrange\\\":[40.73559261728347,40.73568482334759]}}],\\\"data\\\":{\\\"e1eebed5-39fb-4ef3-9746-ce8403d7b6bb\\\":[{\\\"x\\\":-73.98625268168838,\\\"y\\\":40.73562601945317,\\\"cluster\\\":0},{\\\"x\\\":-73.98625173238652,\\\"y\\\":40.735625569382876,\\\"cluster\\\":0},{\\\"x\\\":-73.9862518966646,\\\"y\\\":40.73562642272427,\\\"cluster\\\":0},{\\\"x\\\":-73.986252242017,\\\"y\\\":40.735626656082445,\\\"cluster\\\":0},{\\\"x\\\":-73.98625152460835,\\\"y\\\":40.735626229414,\\\"cluster\\\":0},{\\\"x\\\":-73.98625258509149,\\\"y\\\":40.7356272053874,\\\"cluster\\\":0},{\\\"x\\\":-73.98625254867669,\\\"y\\\":40.735624721075084,\\\"cluster\\\":0},{\\\"x\\\":-73.98625207318744,\\\"y\\\":40.73562418649854,\\\"cluster\\\":0},{\\\"x\\\":-73.98625077341322,\\\"y\\\":40.73562552211442,\\\"cluster\\\":0}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-3e93ca2b-76c3-4c1e-ac75-7e5166ea7790');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
] | |
} | |
], | |
"prompt_number": 75 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 2. Clustering vertices\n", | |
"\n", | |
"Now we need to:\n", | |
"\n", | |
"1. work backwards from the centroid clusters that have three or more centroids (only one in this case)\n", | |
"1. find the polygons they belong to and, finally,\n", | |
"1. find their vertices and cluster them\n", | |
"\n", | |
"Below a function that retrieves the polygons for a given centroid cluster based on the structures we have built so far:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# given a list of centroids (lon,lat), find their poly's index in the centroid list (index => lon,lat)\n", | |
"def get_polys_for_centroid_cluster(cluster, centroids, original_polys)\n", | |
" polys = []\n", | |
" cluster.each do |cl|\n", | |
" index = centroids.select {|k,v| v == cl}.keys.first\n", | |
" polys.push(original_polys[index]) if index != -1\n", | |
" end\n", | |
" return polys\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 20, | |
"text": [ | |
":get_polys_for_centroid_cluster" | |
] | |
} | |
], | |
"prompt_number": 20 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Applying this to the only cluster that has useful centroids:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"cluster_polygons = get_polys_for_centroid_cluster(centroid_clusters[0], centroids, geocollection)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 21, | |
"text": [ | |
"[#<RGeo::GeoJSON::Feature:0x82193ed0 id=nil geom=\"POLYGON ((-73.98620970547199 40.7356342514617, -73.98627072572708 40.735547874977094, -73.98632504045963 40.73557226364293, -73.98622445762157 40.73570995781772, -73.9861835539341 40.73569268254945, -73.98621775209902 40.735640856717666, -73.98620970547199 40.7356342514617))\">, #<RGeo::GeoJSON::Feature:0x82193674 id=nil geom=\"POLYGON ((-73.98620769381522 40.73563526765495, -73.9862660318613 40.735547874977094, -73.98632504045963 40.735570739351566, -73.98622579872608 40.73570944972167, -73.98618154227734 40.73569217445325, -73.98621775209902 40.73563933242788, -73.98620769381522 40.73563526765495))\">, #<RGeo::GeoJSON::Feature:0x82193034 id=nil geom=\"POLYGON ((-73.98632369935513 40.735570739351566, -73.98622512817383 40.73570944972167, -73.98618154227734 40.73569014206842, -73.98621909320354 40.735640856717666, -73.98620970547199 40.73563526765495, -73.98627005517483 40.73554889117169, -73.98632369935513 40.735570739351566))\">, #<RGeo::GeoJSON::Feature:0x82192b20 id=nil geom=\"POLYGON ((-73.98621842265129 40.7356423810074, -73.98620903491974 40.73563577575159, -73.98627139627934 40.735547874977094, -73.98632436990738 40.735571755545806, -73.98622579872608 40.73570995781772, -73.98618087172508 40.735689633972214, -73.98621842265129 40.7356423810074))\">, #<RGeo::GeoJSON::Feature:0x82192620 id=nil geom=\"POLYGON ((-73.98626938462257 40.73554889117167, -73.98632369935513 40.735572771740024, -73.98622445762157 40.73570894162559, -73.98618154227734 40.73569065016463, -73.98621775209902 40.735640856717666, -73.98620836436749 40.735634251461676, -73.98626938462257 40.73554889117167))\">, #<RGeo::GeoJSON::Feature:0x8218fe48 id=nil geom=\"POLYGON ((-73.98632571101189 40.735571755545806, -73.98622378706932 40.73570995781772, -73.98618288338184 40.73569268254945, -73.98621775209902 40.73564034862108, -73.9862110465765 40.7356362838482, -73.98627005517483 40.735550923560815, -73.98632571101189 40.735571755545806))\">, #<RGeo::GeoJSON::Feature:0x8218ef5c id=nil geom=\"POLYGON ((-73.98620970547199 40.73563475955834, -73.98627005517483 40.73554990736624, -73.98632369935513 40.735571755545806, -73.98622360456956 40.73570641325812, -73.9861848950386 40.735689633972214, -73.98621842265129 40.735640856717666, -73.98620970547199 40.73563475955834))\">, #<RGeo::GeoJSON::Feature:0x82192120 id=nil geom=\"POLYGON ((-73.98621775209902 40.73563984052446, -73.98620836436749 40.73563272717173, -73.98626938462257 40.735550415463514, -73.98632235825062 40.73557124744871, -73.98622360456956 40.73570641325812, -73.98618768252459 40.73568957578454, -73.98621775209902 40.73563984052446))\">, #<RGeo::GeoJSON::Feature:0x8218e55c id=nil geom=\"POLYGON ((-73.98621909320354 40.735638316234656, -73.98620836436749 40.7356362838482, -73.98620769381522 40.73563577575159, -73.98627005517483 40.73554939926897, -73.98632302880287 40.73557023125444, -73.98622360456956 40.73570641325812, -73.98617953062057 40.735689633972214, -73.98621909320354 40.735638316234656))\">]" | |
] | |
} | |
], | |
"prompt_number": 21 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We need a method to extract the vertices from each polygon (in a DBSCAN-compatible format):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def get_points(poly_feature)\n", | |
" geom = poly_feature.geometry\n", | |
" return false if (geom.geometry_type.type_name != \"Polygon\")\n", | |
" pts = []\n", | |
" points = geom.exterior_ring.points\n", | |
" points.each do |point|\n", | |
" pts.push([point.x,point.y])\n", | |
" end\n", | |
" return pts\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 22, | |
"text": [ | |
":get_points" | |
] | |
} | |
], | |
"prompt_number": 22 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now let's plot what we have so far (vertices from the same polygon are the same color):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def plot_polys(polys)\n", | |
" plot = Nyaplot::Plot.new\n", | |
" plot.width(500)\n", | |
" plot.height(500)\n", | |
" plot.zoom(true)\n", | |
" polys.each do |poly|\n", | |
" plot_poly(poly, plot)\n", | |
" end\n", | |
" plot.show\n", | |
"end\n", | |
"def plot_poly(poly, plot = nil)\n", | |
" showplot = false\n", | |
" if plot == nil\n", | |
" showplot = true\n", | |
" plot = Nyaplot::Plot.new\n", | |
" plot.width(500)\n", | |
" plot.height(500)\n", | |
" plot.zoom(true)\n", | |
" end\n", | |
" points = get_points(poly)\n", | |
" points_x = points.map { |p| p[0] }\n", | |
" points_y = points.map { |p| p[1] }\n", | |
" df = Nyaplot::DataFrame.new({x:points_x,y:points_y})\n", | |
" sc = plot.add_with_df(df, :scatter, :x, :y)\n", | |
" color = \"#\"+ \"%06x\" % (rand * 0xffffff)\n", | |
" sc.color(color)\n", | |
" plot.show if showplot\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 24, | |
"text": [ | |
":plot_poly" | |
] | |
} | |
], | |
"prompt_number": 24 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"plot_polys(cluster_polygons)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div id='vis-2b3756b1-88e8-4390-826c-231a36eb4fd0'></div>\n", | |
"<script>\n", | |
"(function(){\n", | |
" var render = function(){\n", | |
" var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#00bfec\"},\"data\":\"86e68646-fba8-4acd-8b17-082a080a591c\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#935774\"},\"data\":\"e6f3a8e9-6cf9-4723-bf04-39952070c3ce\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#d52cca\"},\"data\":\"0ea89e6f-33b2-438a-a274-fd80f6a09672\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#f9c8b0\"},\"data\":\"b74703f9-7159-40d3-9b63-edf7d496fae9\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#4b8057\"},\"data\":\"fd253884-5423-4742-a8c8-6c0409ed547c\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#5f4c55\"},\"data\":\"f4cac916-38de-487f-b47b-b3fe908782fb\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#652be2\"},\"data\":\"27e5d698-c8db-4964-90f3-dfa0b8f3186b\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#db0832\"},\"data\":\"c8f1302b-4de3-4499-a9d9-cf9444a45ab7\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#91a3cc\"},\"data\":\"25e44099-fd37-458d-a79e-013189a893d8\"}],\"options\":{\"width\":500,\"height\":500,\"zoom\":true,\"xrange\":[-73.98632571101189,-73.98617953062057],\"yrange\":[40.735547874977094,40.73570995781772]}}],\"data\":{\"86e68646-fba8-4acd-8b17-082a080a591c\":[{\"x\":-73.98620970547199,\"y\":40.7356342514617},{\"x\":-73.98627072572708,\"y\":40.735547874977094},{\"x\":-73.98632504045963,\"y\":40.73557226364293},{\"x\":-73.98622445762157,\"y\":40.73570995781772},{\"x\":-73.9861835539341,\"y\":40.73569268254945},{\"x\":-73.98621775209902,\"y\":40.735640856717666},{\"x\":-73.98620970547199,\"y\":40.7356342514617}],\"e6f3a8e9-6cf9-4723-bf04-39952070c3ce\":[{\"x\":-73.98620769381522,\"y\":40.73563526765495},{\"x\":-73.9862660318613,\"y\":40.735547874977094},{\"x\":-73.98632504045963,\"y\":40.735570739351566},{\"x\":-73.98622579872608,\"y\":40.73570944972167},{\"x\":-73.98618154227734,\"y\":40.73569217445325},{\"x\":-73.98621775209902,\"y\":40.73563933242788},{\"x\":-73.98620769381522,\"y\":40.73563526765495}],\"0ea89e6f-33b2-438a-a274-fd80f6a09672\":[{\"x\":-73.98632369935513,\"y\":40.735570739351566},{\"x\":-73.98622512817383,\"y\":40.73570944972167},{\"x\":-73.98618154227734,\"y\":40.73569014206842},{\"x\":-73.98621909320354,\"y\":40.735640856717666},{\"x\":-73.98620970547199,\"y\":40.73563526765495},{\"x\":-73.98627005517483,\"y\":40.73554889117169},{\"x\":-73.98632369935513,\"y\":40.735570739351566}],\"b74703f9-7159-40d3-9b63-edf7d496fae9\":[{\"x\":-73.98621842265129,\"y\":40.7356423810074},{\"x\":-73.98620903491974,\"y\":40.73563577575159},{\"x\":-73.98627139627934,\"y\":40.735547874977094},{\"x\":-73.98632436990738,\"y\":40.735571755545806},{\"x\":-73.98622579872608,\"y\":40.73570995781772},{\"x\":-73.98618087172508,\"y\":40.735689633972214},{\"x\":-73.98621842265129,\"y\":40.7356423810074}],\"fd253884-5423-4742-a8c8-6c0409ed547c\":[{\"x\":-73.98626938462257,\"y\":40.73554889117167},{\"x\":-73.98632369935513,\"y\":40.735572771740024},{\"x\":-73.98622445762157,\"y\":40.73570894162559},{\"x\":-73.98618154227734,\"y\":40.73569065016463},{\"x\":-73.98621775209902,\"y\":40.735640856717666},{\"x\":-73.98620836436749,\"y\":40.735634251461676},{\"x\":-73.98626938462257,\"y\":40.73554889117167}],\"f4cac916-38de-487f-b47b-b3fe908782fb\":[{\"x\":-73.98632571101189,\"y\":40.735571755545806},{\"x\":-73.98622378706932,\"y\":40.73570995781772},{\"x\":-73.98618288338184,\"y\":40.73569268254945},{\"x\":-73.98621775209902,\"y\":40.73564034862108},{\"x\":-73.9862110465765,\"y\":40.7356362838482},{\"x\":-73.98627005517483,\"y\":40.735550923560815},{\"x\":-73.98632571101189,\"y\":40.735571755545806}],\"27e5d698-c8db-4964-90f3-dfa0b8f3186b\":[{\"x\":-73.98620970547199,\"y\":40.73563475955834},{\"x\":-73.98627005517483,\"y\":40.73554990736624},{\"x\":-73.98632369935513,\"y\":40.735571755545806},{\"x\":-73.98622360456956,\"y\":40.73570641325812},{\"x\":-73.9861848950386,\"y\":40.735689633972214},{\"x\":-73.98621842265129,\"y\":40.735640856717666},{\"x\":-73.98620970547199,\"y\":40.73563475955834}],\"c8f1302b-4de3-4499-a9d9-cf9444a45ab7\":[{\"x\":-73.98621775209902,\"y\":40.73563984052446},{\"x\":-73.98620836436749,\"y\":40.73563272717173},{\"x\":-73.98626938462257,\"y\":40.735550415463514},{\"x\":-73.98632235825062,\"y\":40.73557124744871},{\"x\":-73.98622360456956,\"y\":40.73570641325812},{\"x\":-73.98618768252459,\"y\":40.73568957578454},{\"x\":-73.98621775209902,\"y\":40.73563984052446}],\"25e44099-fd37-458d-a79e-013189a893d8\":[{\"x\":-73.98621909320354,\"y\":40.735638316234656},{\"x\":-73.98620836436749,\"y\":40.7356362838482},{\"x\":-73.98620769381522,\"y\":40.73563577575159},{\"x\":-73.98627005517483,\"y\":40.73554939926897},{\"x\":-73.98632302880287,\"y\":40.73557023125444},{\"x\":-73.98622360456956,\"y\":40.73570641325812},{\"x\":-73.98617953062057,\"y\":40.735689633972214},{\"x\":-73.98621909320354,\"y\":40.735638316234656}]},\"extension\":[]}\n", | |
" Nyaplot.core.parse(model, '#vis-2b3756b1-88e8-4390-826c-231a36eb4fd0');\n", | |
" };\n", | |
" if(window['Nyaplot']==undefined){\n", | |
" window.addEventListener('load_nyaplot', render, false);\n", | |
"\treturn;\n", | |
" }\n", | |
" render();\n", | |
"})();\n", | |
"</script>\n" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 25, | |
"text": [ | |
"\"<div id='vis-2b3756b1-88e8-4390-826c-231a36eb4fd0'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#00bfec\\\"},\\\"data\\\":\\\"86e68646-fba8-4acd-8b17-082a080a591c\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#935774\\\"},\\\"data\\\":\\\"e6f3a8e9-6cf9-4723-bf04-39952070c3ce\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#d52cca\\\"},\\\"data\\\":\\\"0ea89e6f-33b2-438a-a274-fd80f6a09672\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#f9c8b0\\\"},\\\"data\\\":\\\"b74703f9-7159-40d3-9b63-edf7d496fae9\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#4b8057\\\"},\\\"data\\\":\\\"fd253884-5423-4742-a8c8-6c0409ed547c\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#5f4c55\\\"},\\\"data\\\":\\\"f4cac916-38de-487f-b47b-b3fe908782fb\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#652be2\\\"},\\\"data\\\":\\\"27e5d698-c8db-4964-90f3-dfa0b8f3186b\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#db0832\\\"},\\\"data\\\":\\\"c8f1302b-4de3-4499-a9d9-cf9444a45ab7\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#91a3cc\\\"},\\\"data\\\":\\\"25e44099-fd37-458d-a79e-013189a893d8\\\"}],\\\"options\\\":{\\\"width\\\":500,\\\"height\\\":500,\\\"zoom\\\":true,\\\"xrange\\\":[-73.98632571101189,-73.98617953062057],\\\"yrange\\\":[40.735547874977094,40.73570995781772]}}],\\\"data\\\":{\\\"86e68646-fba8-4acd-8b17-082a080a591c\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617},{\\\"x\\\":-73.98627072572708,\\\"y\\\":40.735547874977094},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.73557226364293},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570995781772},{\\\"x\\\":-73.9861835539341,\\\"y\\\":40.73569268254945},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617}],\\\"e6f3a8e9-6cf9-4723-bf04-39952070c3ce\\\":[{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495},{\\\"x\\\":-73.9862660318613,\\\"y\\\":40.735547874977094},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.735570739351566},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570944972167},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569217445325},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563933242788},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495}],\\\"0ea89e6f-33b2-438a-a274-fd80f6a09672\\\":[{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566},{\\\"x\\\":-73.98622512817383,\\\"y\\\":40.73570944972167},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569014206842},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735640856717666},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563526765495},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554889117169},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566}],\\\"b74703f9-7159-40d3-9b63-edf7d496fae9\\\":[{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074},{\\\"x\\\":-73.98620903491974,\\\"y\\\":40.73563577575159},{\\\"x\\\":-73.98627139627934,\\\"y\\\":40.735547874977094},{\\\"x\\\":-73.98632436990738,\\\"y\\\":40.735571755545806},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570995781772},{\\\"x\\\":-73.98618087172508,\\\"y\\\":40.735689633972214},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074}],\\\"fd253884-5423-4742-a8c8-6c0409ed547c\\\":[{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735572771740024},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570894162559},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569065016463},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.735634251461676},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167}],\\\"f4cac916-38de-487f-b47b-b3fe908782fb\\\":[{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806},{\\\"x\\\":-73.98622378706932,\\\"y\\\":40.73570995781772},{\\\"x\\\":-73.98618288338184,\\\"y\\\":40.73569268254945},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73564034862108},{\\\"x\\\":-73.9862110465765,\\\"y\\\":40.7356362838482},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.735550923560815},{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806}],\\\"27e5d698-c8db-4964-90f3-dfa0b8f3186b\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554990736624},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735571755545806},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812},{\\\"x\\\":-73.9861848950386,\\\"y\\\":40.735689633972214},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.735640856717666},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834}],\\\"c8f1302b-4de3-4499-a9d9-cf9444a45ab7\\\":[{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.73563272717173},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.735550415463514},{\\\"x\\\":-73.98632235825062,\\\"y\\\":40.73557124744871},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812},{\\\"x\\\":-73.98618768252459,\\\"y\\\":40.73568957578454},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446}],\\\"25e44099-fd37-458d-a79e-013189a893d8\\\":[{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.7356362838482},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563577575159},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554939926897},{\\\"x\\\":-73.98632302880287,\\\"y\\\":40.73557023125444},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812},{\\\"x\\\":-73.98617953062057,\\\"y\\\":40.735689633972214},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-2b3756b1-88e8-4390-826c-231a36eb4fd0');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
] | |
} | |
], | |
"prompt_number": 25 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's cluster these points. Below is a function that extracts the points from a list of polygons:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def get_all_poly_points(polys)\n", | |
" points = []\n", | |
" polys.each do |poly|\n", | |
" points.push(get_points(poly))\n", | |
" end\n", | |
" return points\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 23, | |
"text": [ | |
":get_all_poly_points" | |
] | |
} | |
], | |
"prompt_number": 23 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"cluster_poly_points = get_all_poly_points(cluster_polygons)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 26, | |
"text": [ | |
"[[[-73.98620970547199, 40.7356342514617], [-73.98627072572708, 40.735547874977094], [-73.98632504045963, 40.73557226364293], [-73.98622445762157, 40.73570995781772], [-73.9861835539341, 40.73569268254945], [-73.98621775209902, 40.735640856717666], [-73.98620970547199, 40.7356342514617]], [[-73.98620769381522, 40.73563526765495], [-73.9862660318613, 40.735547874977094], [-73.98632504045963, 40.735570739351566], [-73.98622579872608, 40.73570944972167], [-73.98618154227734, 40.73569217445325], [-73.98621775209902, 40.73563933242788], [-73.98620769381522, 40.73563526765495]], [[-73.98632369935513, 40.735570739351566], [-73.98622512817383, 40.73570944972167], [-73.98618154227734, 40.73569014206842], [-73.98621909320354, 40.735640856717666], [-73.98620970547199, 40.73563526765495], [-73.98627005517483, 40.73554889117169], [-73.98632369935513, 40.735570739351566]], [[-73.98621842265129, 40.7356423810074], [-73.98620903491974, 40.73563577575159], [-73.98627139627934, 40.735547874977094], [-73.98632436990738, 40.735571755545806], [-73.98622579872608, 40.73570995781772], [-73.98618087172508, 40.735689633972214], [-73.98621842265129, 40.7356423810074]], [[-73.98626938462257, 40.73554889117167], [-73.98632369935513, 40.735572771740024], [-73.98622445762157, 40.73570894162559], [-73.98618154227734, 40.73569065016463], [-73.98621775209902, 40.735640856717666], [-73.98620836436749, 40.735634251461676], [-73.98626938462257, 40.73554889117167]], [[-73.98632571101189, 40.735571755545806], [-73.98622378706932, 40.73570995781772], [-73.98618288338184, 40.73569268254945], [-73.98621775209902, 40.73564034862108], [-73.9862110465765, 40.7356362838482], [-73.98627005517483, 40.735550923560815], [-73.98632571101189, 40.735571755545806]], [[-73.98620970547199, 40.73563475955834], [-73.98627005517483, 40.73554990736624], [-73.98632369935513, 40.735571755545806], [-73.98622360456956, 40.73570641325812], [-73.9861848950386, 40.735689633972214], [-73.98621842265129, 40.735640856717666], [-73.98620970547199, 40.73563475955834]], [[-73.98621775209902, 40.73563984052446], [-73.98620836436749, 40.73563272717173], [-73.98626938462257, 40.735550415463514], [-73.98632235825062, 40.73557124744871], [-73.98622360456956, 40.73570641325812], [-73.98618768252459, 40.73568957578454], [-73.98621775209902, 40.73563984052446]], [[-73.98621909320354, 40.735638316234656], [-73.98620836436749, 40.7356362838482], [-73.98620769381522, 40.73563577575159], [-73.98627005517483, 40.73554939926897], [-73.98632302880287, 40.73557023125444], [-73.98622360456956, 40.73570641325812], [-73.98617953062057, 40.735689633972214], [-73.98621909320354, 40.735638316234656]]]" | |
] | |
} | |
], | |
"prompt_number": 26 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The better \u03b5 value I found for these points is a bit more complicated. If it is too big, the L-shape will be lost: points in that corner will be clustered together. After fiddling around I found a decent value of of $6(10^{-6})$.\n", | |
"\n", | |
"An important aspect to account for here is that the GeoJSON spec requires that the coordinate array has to begin _and end_ with the _same point_. Therefore this point would be **counted twice** if we leave the array as-is. Below the resulting clustering function, corresponding test, and plot:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def cluster_points(original_points)\n", | |
" # exclude first item in each poly since it is same as last\n", | |
" unique_points = original_points.map{|poly| poly[1..-1]}\n", | |
" dbscan = DBSCAN( unique_points.flatten(1), :epsilon => 6e-06, :min_points => 2, :distance => :euclidean_distance )\n", | |
" return dbscan.results\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 27, | |
"text": [ | |
":cluster_points" | |
] | |
} | |
], | |
"prompt_number": 27 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"vertex_clusters = cluster_points(cluster_poly_points)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 28, | |
"text": [ | |
"{0=>[[-73.98627072572708, 40.735547874977094], [-73.9862660318613, 40.735547874977094], [-73.98627005517483, 40.73554889117169], [-73.98627139627934, 40.735547874977094], [-73.98626938462257, 40.73554889117167], [-73.98627005517483, 40.735550923560815], [-73.98627005517483, 40.73554990736624], [-73.98626938462257, 40.735550415463514], [-73.98627005517483, 40.73554939926897]], 1=>[[-73.98632504045963, 40.73557226364293], [-73.98632504045963, 40.735570739351566], [-73.98632369935513, 40.735570739351566], [-73.98632436990738, 40.735571755545806], [-73.98632369935513, 40.735572771740024], [-73.98632571101189, 40.735571755545806], [-73.98632369935513, 40.735571755545806], [-73.98632235825062, 40.73557124744871], [-73.98632302880287, 40.73557023125444]], 2=>[[-73.98622445762157, 40.73570995781772], [-73.98622579872608, 40.73570944972167], [-73.98622512817383, 40.73570944972167], [-73.98622579872608, 40.73570995781772], [-73.98622445762157, 40.73570894162559], [-73.98622378706932, 40.73570995781772], [-73.98622360456956, 40.73570641325812], [-73.98622360456956, 40.73570641325812], [-73.98622360456956, 40.73570641325812]], 3=>[[-73.9861835539341, 40.73569268254945], [-73.98618154227734, 40.73569217445325], [-73.98618154227734, 40.73569014206842], [-73.98618087172508, 40.735689633972214], [-73.98618154227734, 40.73569065016463], [-73.98618288338184, 40.73569268254945], [-73.9861848950386, 40.735689633972214], [-73.98618768252459, 40.73568957578454], [-73.98617953062057, 40.735689633972214]], 4=>[[-73.98621775209902, 40.735640856717666], [-73.98621775209902, 40.73563933242788], [-73.98621909320354, 40.735640856717666], [-73.98621842265129, 40.7356423810074], [-73.98621775209902, 40.73564034862108], [-73.98621842265129, 40.735640856717666], [-73.98621775209902, 40.73563984052446], [-73.98621909320354, 40.735638316234656], [-73.98621775209902, 40.735640856717666]], 5=>[[-73.98620970547199, 40.7356342514617], [-73.98620769381522, 40.73563526765495], [-73.98620970547199, 40.73563526765495], [-73.98620903491974, 40.73563577575159], [-73.98620836436749, 40.735634251461676], [-73.9862110465765, 40.7356362838482], [-73.98620970547199, 40.73563475955834], [-73.98620836436749, 40.73563272717173], [-73.98620836436749, 40.7356362838482], [-73.98620769381522, 40.73563577575159]]}" | |
] | |
} | |
], | |
"prompt_number": 28 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"plot = plot_clusters(vertex_clusters)\n", | |
"plot.show" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div id='vis-f14ed6de-6510-4068-9d5b-fd791dedef9c'></div>\n", | |
"<script>\n", | |
"(function(){\n", | |
" var render = function(){\n", | |
" var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#bdbc87\"},\"data\":\"d10bbc3f-93c7-4fb2-95eb-898231f92cd6\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#540b62\"},\"data\":\"5484f796-d555-4618-8155-726ebace491a\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#d70091\"},\"data\":\"b86a30c7-2d30-4a13-8267-9504a97278c4\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#032eb6\"},\"data\":\"12f2b5e4-7154-4d64-a1df-f2decfe6d82f\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#1442a1\"},\"data\":\"3795f8cd-5c80-407c-ac10-374d3d5dadb4\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#6c1aa2\"},\"data\":\"b5757dff-56c7-409f-9497-7815910277e3\"}],\"options\":{\"width\":300,\"height\":400,\"zoom\":true,\"xrange\":[-73.98633571101189,-73.98616953062057],\"yrange\":[40.73553787497709,40.73571995781772]}}],\"data\":{\"d10bbc3f-93c7-4fb2-95eb-898231f92cd6\":[{\"x\":-73.98627072572708,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.9862660318613,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554889117169,\"cluster\":0},{\"x\":-73.98627139627934,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.73554889117167,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.735550923560815,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554990736624,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.735550415463514,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554939926897,\"cluster\":0}],\"5484f796-d555-4618-8155-726ebace491a\":[{\"x\":-73.98632504045963,\"y\":40.73557226364293,\"cluster\":1},{\"x\":-73.98632504045963,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632436990738,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735572771740024,\"cluster\":1},{\"x\":-73.98632571101189,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632235825062,\"y\":40.73557124744871,\"cluster\":1},{\"x\":-73.98632302880287,\"y\":40.73557023125444,\"cluster\":1}],\"b86a30c7-2d30-4a13-8267-9504a97278c4\":[{\"x\":-73.98622445762157,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622512817383,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622445762157,\"y\":40.73570894162559,\"cluster\":2},{\"x\":-73.98622378706932,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2}],\"12f2b5e4-7154-4d64-a1df-f2decfe6d82f\":[{\"x\":-73.9861835539341,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569217445325,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569014206842,\"cluster\":3},{\"x\":-73.98618087172508,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569065016463,\"cluster\":3},{\"x\":-73.98618288338184,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.9861848950386,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618768252459,\"y\":40.73568957578454,\"cluster\":3},{\"x\":-73.98617953062057,\"y\":40.735689633972214,\"cluster\":3}],\"3795f8cd-5c80-407c-ac10-374d3d5dadb4\":[{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563933242788,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.7356423810074,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73564034862108,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563984052446,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735638316234656,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4}],\"b5757dff-56c7-409f-9497-7815910277e3\":[{\"x\":-73.98620970547199,\"y\":40.7356342514617,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620903491974,\"y\":40.73563577575159,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.735634251461676,\"cluster\":5},{\"x\":-73.9862110465765,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563475955834,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.73563272717173,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563577575159,\"cluster\":5}]},\"extension\":[]}\n", | |
" Nyaplot.core.parse(model, '#vis-f14ed6de-6510-4068-9d5b-fd791dedef9c');\n", | |
" };\n", | |
" if(window['Nyaplot']==undefined){\n", | |
" window.addEventListener('load_nyaplot', render, false);\n", | |
"\treturn;\n", | |
" }\n", | |
" render();\n", | |
"})();\n", | |
"</script>\n" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 29, | |
"text": [ | |
"\"<div id='vis-f14ed6de-6510-4068-9d5b-fd791dedef9c'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#bdbc87\\\"},\\\"data\\\":\\\"d10bbc3f-93c7-4fb2-95eb-898231f92cd6\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#540b62\\\"},\\\"data\\\":\\\"5484f796-d555-4618-8155-726ebace491a\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#d70091\\\"},\\\"data\\\":\\\"b86a30c7-2d30-4a13-8267-9504a97278c4\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#032eb6\\\"},\\\"data\\\":\\\"12f2b5e4-7154-4d64-a1df-f2decfe6d82f\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#1442a1\\\"},\\\"data\\\":\\\"3795f8cd-5c80-407c-ac10-374d3d5dadb4\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#6c1aa2\\\"},\\\"data\\\":\\\"b5757dff-56c7-409f-9497-7815910277e3\\\"}],\\\"options\\\":{\\\"width\\\":300,\\\"height\\\":400,\\\"zoom\\\":true,\\\"xrange\\\":[-73.98633571101189,-73.98616953062057],\\\"yrange\\\":[40.73553787497709,40.73571995781772]}}],\\\"data\\\":{\\\"d10bbc3f-93c7-4fb2-95eb-898231f92cd6\\\":[{\\\"x\\\":-73.98627072572708,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.9862660318613,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554889117169,\\\"cluster\\\":0},{\\\"x\\\":-73.98627139627934,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.735550923560815,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554990736624,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.735550415463514,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554939926897,\\\"cluster\\\":0}],\\\"5484f796-d555-4618-8155-726ebace491a\\\":[{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.73557226364293,\\\"cluster\\\":1},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632436990738,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735572771740024,\\\"cluster\\\":1},{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632235825062,\\\"y\\\":40.73557124744871,\\\"cluster\\\":1},{\\\"x\\\":-73.98632302880287,\\\"y\\\":40.73557023125444,\\\"cluster\\\":1}],\\\"b86a30c7-2d30-4a13-8267-9504a97278c4\\\":[{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622512817383,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570894162559,\\\"cluster\\\":2},{\\\"x\\\":-73.98622378706932,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2}],\\\"12f2b5e4-7154-4d64-a1df-f2decfe6d82f\\\":[{\\\"x\\\":-73.9861835539341,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569217445325,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569014206842,\\\"cluster\\\":3},{\\\"x\\\":-73.98618087172508,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569065016463,\\\"cluster\\\":3},{\\\"x\\\":-73.98618288338184,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.9861848950386,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618768252459,\\\"y\\\":40.73568957578454,\\\"cluster\\\":3},{\\\"x\\\":-73.98617953062057,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3}],\\\"3795f8cd-5c80-407c-ac10-374d3d5dadb4\\\":[{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563933242788,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73564034862108,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4}],\\\"b5757dff-56c7-409f-9497-7815910277e3\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620903491974,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.735634251461676,\\\"cluster\\\":5},{\\\"x\\\":-73.9862110465765,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.73563272717173,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-f14ed6de-6510-4068-9d5b-fd791dedef9c');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
] | |
} | |
], | |
"prompt_number": 29 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 3. Finding the mean polygon\n", | |
"\n", | |
"Now we iterate through each vertex cluster and:\n", | |
"\n", | |
"1. find the mean vertex\n", | |
"1. connect the mean vertices into a mean polygon\n", | |
"\n", | |
"For this we need some extra functions in the `Array` object to find the mean value:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"class Array\n", | |
" def sum\n", | |
" inject(0.0) { |result, el| result + el }\n", | |
" end\n", | |
"\n", | |
" def mean \n", | |
" sum / size\n", | |
" end\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 30, | |
"text": [ | |
":mean" | |
] | |
} | |
], | |
"prompt_number": 30 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we need a function that receives the vertex clusters and returns the average vertex for each cluster:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def get_mean_poly(clusters)\n", | |
" means = {}\n", | |
" clusters.each do |cluster|\n", | |
" if cluster[0] != -1 # ignore cluster -1 because not enough points\n", | |
" lon = cluster[1].map {|c| c[0]}.mean\n", | |
" lat = cluster[1].map {|c| c[1]}.mean\n", | |
" means[cluster[0]] = [lon,lat]\n", | |
" end\n", | |
" end\n", | |
" return means\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 31, | |
"text": [ | |
":get_mean_poly" | |
] | |
} | |
], | |
"prompt_number": 31 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We test this function with our vertex clusters and plot both (mean vertices as yellow diamonds):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"mean_poly = get_mean_poly(vertex_clusters)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 32, | |
"text": [ | |
"{0=>[-73.9862696826458, 40.73554911699269], 1=>[-73.98632407188416, 40.73557147326963], 2=>[-73.98622447129412, 40.73570855047738], 3=>[-73.98618267156186, 40.73569075660959], 4=>[-73.98621819913387, 40.73564040507624], 5=>[-73.98620896786451, 40.735635064416286]}" | |
] | |
} | |
], | |
"prompt_number": 32 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# plot clusters with overlaid (yellow) mean points\n", | |
"plot = plot_clusters(vertex_clusters)\n", | |
"# add means\n", | |
"m_x = mean_poly.map { |m| m[1][0] }\n", | |
"m_y = mean_poly.map { |m| m[1][1] }\n", | |
"sc = plot.add(:scatter, m_x, m_y)\n", | |
"color = \"#ffff00\"\n", | |
"sc.color(color)\n", | |
"sc.shape('diamond')\n", | |
"plot.show" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div id='vis-58d32c47-9127-47bb-aade-6e05ae75616c'></div>\n", | |
"<script>\n", | |
"(function(){\n", | |
" var render = function(){\n", | |
" var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#1febf2\"},\"data\":\"aa7e5490-3987-43bc-a36c-bdfef261e865\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#8ec176\"},\"data\":\"b6a991f8-b7f5-46b7-a9d6-d98ff2c277e4\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#da94f9\"},\"data\":\"cfc1d108-6931-4748-b902-29d43a3e44ef\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#10745a\"},\"data\":\"e7c6c5f6-4e7c-4474-b9a1-bf1e37f5ea7f\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#d73c59\"},\"data\":\"543380ec-8293-4c6a-96db-197b5e3edae5\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#af424a\"},\"data\":\"857d48d5-33d7-4199-bba1-4aae2b7f6adb\"},{\"type\":\"scatter\",\"options\":{\"x\":\"data0\",\"y\":\"data1\",\"color\":\"#ffff00\",\"shape\":\"diamond\"},\"data\":\"cb52daa3-be38-4143-9bee-83da9230dfa5\"}],\"options\":{\"width\":300,\"height\":400,\"zoom\":true,\"xrange\":[-73.98633571101189,-73.98616953062057],\"yrange\":[40.73553787497709,40.73571995781772]}}],\"data\":{\"aa7e5490-3987-43bc-a36c-bdfef261e865\":[{\"x\":-73.98627072572708,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.9862660318613,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554889117169,\"cluster\":0},{\"x\":-73.98627139627934,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.73554889117167,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.735550923560815,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554990736624,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.735550415463514,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554939926897,\"cluster\":0}],\"b6a991f8-b7f5-46b7-a9d6-d98ff2c277e4\":[{\"x\":-73.98632504045963,\"y\":40.73557226364293,\"cluster\":1},{\"x\":-73.98632504045963,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632436990738,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735572771740024,\"cluster\":1},{\"x\":-73.98632571101189,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632235825062,\"y\":40.73557124744871,\"cluster\":1},{\"x\":-73.98632302880287,\"y\":40.73557023125444,\"cluster\":1}],\"cfc1d108-6931-4748-b902-29d43a3e44ef\":[{\"x\":-73.98622445762157,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622512817383,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622445762157,\"y\":40.73570894162559,\"cluster\":2},{\"x\":-73.98622378706932,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2}],\"e7c6c5f6-4e7c-4474-b9a1-bf1e37f5ea7f\":[{\"x\":-73.9861835539341,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569217445325,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569014206842,\"cluster\":3},{\"x\":-73.98618087172508,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569065016463,\"cluster\":3},{\"x\":-73.98618288338184,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.9861848950386,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618768252459,\"y\":40.73568957578454,\"cluster\":3},{\"x\":-73.98617953062057,\"y\":40.735689633972214,\"cluster\":3}],\"543380ec-8293-4c6a-96db-197b5e3edae5\":[{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563933242788,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.7356423810074,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73564034862108,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563984052446,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735638316234656,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4}],\"857d48d5-33d7-4199-bba1-4aae2b7f6adb\":[{\"x\":-73.98620970547199,\"y\":40.7356342514617,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620903491974,\"y\":40.73563577575159,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.735634251461676,\"cluster\":5},{\"x\":-73.9862110465765,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563475955834,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.73563272717173,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563577575159,\"cluster\":5}],\"cb52daa3-be38-4143-9bee-83da9230dfa5\":[{\"data0\":-73.9862696826458,\"data1\":40.73554911699269},{\"data0\":-73.98632407188416,\"data1\":40.73557147326963},{\"data0\":-73.98622447129412,\"data1\":40.73570855047738},{\"data0\":-73.98618267156186,\"data1\":40.73569075660959},{\"data0\":-73.98621819913387,\"data1\":40.73564040507624},{\"data0\":-73.98620896786451,\"data1\":40.735635064416286}]},\"extension\":[]}\n", | |
" Nyaplot.core.parse(model, '#vis-58d32c47-9127-47bb-aade-6e05ae75616c');\n", | |
" };\n", | |
" if(window['Nyaplot']==undefined){\n", | |
" window.addEventListener('load_nyaplot', render, false);\n", | |
"\treturn;\n", | |
" }\n", | |
" render();\n", | |
"})();\n", | |
"</script>\n" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 33, | |
"text": [ | |
"\"<div id='vis-58d32c47-9127-47bb-aade-6e05ae75616c'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#1febf2\\\"},\\\"data\\\":\\\"aa7e5490-3987-43bc-a36c-bdfef261e865\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#8ec176\\\"},\\\"data\\\":\\\"b6a991f8-b7f5-46b7-a9d6-d98ff2c277e4\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#da94f9\\\"},\\\"data\\\":\\\"cfc1d108-6931-4748-b902-29d43a3e44ef\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#10745a\\\"},\\\"data\\\":\\\"e7c6c5f6-4e7c-4474-b9a1-bf1e37f5ea7f\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#d73c59\\\"},\\\"data\\\":\\\"543380ec-8293-4c6a-96db-197b5e3edae5\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#af424a\\\"},\\\"data\\\":\\\"857d48d5-33d7-4199-bba1-4aae2b7f6adb\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\",\\\"color\\\":\\\"#ffff00\\\",\\\"shape\\\":\\\"diamond\\\"},\\\"data\\\":\\\"cb52daa3-be38-4143-9bee-83da9230dfa5\\\"}],\\\"options\\\":{\\\"width\\\":300,\\\"height\\\":400,\\\"zoom\\\":true,\\\"xrange\\\":[-73.98633571101189,-73.98616953062057],\\\"yrange\\\":[40.73553787497709,40.73571995781772]}}],\\\"data\\\":{\\\"aa7e5490-3987-43bc-a36c-bdfef261e865\\\":[{\\\"x\\\":-73.98627072572708,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.9862660318613,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554889117169,\\\"cluster\\\":0},{\\\"x\\\":-73.98627139627934,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.735550923560815,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554990736624,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.735550415463514,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554939926897,\\\"cluster\\\":0}],\\\"b6a991f8-b7f5-46b7-a9d6-d98ff2c277e4\\\":[{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.73557226364293,\\\"cluster\\\":1},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632436990738,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735572771740024,\\\"cluster\\\":1},{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632235825062,\\\"y\\\":40.73557124744871,\\\"cluster\\\":1},{\\\"x\\\":-73.98632302880287,\\\"y\\\":40.73557023125444,\\\"cluster\\\":1}],\\\"cfc1d108-6931-4748-b902-29d43a3e44ef\\\":[{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622512817383,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570894162559,\\\"cluster\\\":2},{\\\"x\\\":-73.98622378706932,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2}],\\\"e7c6c5f6-4e7c-4474-b9a1-bf1e37f5ea7f\\\":[{\\\"x\\\":-73.9861835539341,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569217445325,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569014206842,\\\"cluster\\\":3},{\\\"x\\\":-73.98618087172508,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569065016463,\\\"cluster\\\":3},{\\\"x\\\":-73.98618288338184,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.9861848950386,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618768252459,\\\"y\\\":40.73568957578454,\\\"cluster\\\":3},{\\\"x\\\":-73.98617953062057,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3}],\\\"543380ec-8293-4c6a-96db-197b5e3edae5\\\":[{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563933242788,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73564034862108,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4}],\\\"857d48d5-33d7-4199-bba1-4aae2b7f6adb\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620903491974,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.735634251461676,\\\"cluster\\\":5},{\\\"x\\\":-73.9862110465765,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.73563272717173,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5}],\\\"cb52daa3-be38-4143-9bee-83da9230dfa5\\\":[{\\\"data0\\\":-73.9862696826458,\\\"data1\\\":40.73554911699269},{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963},{\\\"data0\\\":-73.98622447129412,\\\"data1\\\":40.73570855047738},{\\\"data0\\\":-73.98618267156186,\\\"data1\\\":40.73569075660959},{\\\"data0\\\":-73.98621819913387,\\\"data1\\\":40.73564040507624},{\\\"data0\\\":-73.98620896786451,\\\"data1\\\":40.735635064416286}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-58d32c47-9127-47bb-aade-6e05ae75616c');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
] | |
} | |
], | |
"prompt_number": 33 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 4. Connecting it all\n", | |
"\n", | |
"So far we have a set of points that seem to be the most likely vertices of the mean polygon drawn by our contributors. However, there are **many ways in which these points could be connected to each other**.\n", | |
"\n", | |
"**DISCLAIMER**:\n", | |
"\n", | |
"What follows is a _very_ primitive process that I used to determine the most likely connection between those points. This process is the best I could come up with given my limited math knowledge and time. If you have a better idea of how to do this in Ruby please tweet me at [@mgiraldo](https://twitter.com/mgiraldo).\n", | |
"\n", | |
"**/DISCLAIMER**\n", | |
"\n", | |
"Before going through with connections we need to validate that we have a reasonable amount of clusters to work with: some vertices may be drawn far away enough for them to not cluster properly and therefore no cluster will be produced. We do this by determining the mean vertices in each polygon ($\\bar{m}$) and comparing it with the cluster count ($\\sum c$). Right now: $\\bar{m}\\leq\\sum c$ , so we should have at least _as many_ clusters as we have average points per polygon.\n", | |
"\n", | |
"Not perfect but works most of the time:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def validate_clusters(clusters, original_points)\n", | |
" unique_points = original_points.map{|poly| poly[1..-1]}\n", | |
" average = (unique_points.flatten.count.to_f / (unique_points.size * 2).to_f).round\n", | |
" return clusters.select{|k,v| k!=-1}.size >= average\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 34, | |
"text": [ | |
":validate_clusters" | |
] | |
} | |
], | |
"prompt_number": 34 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"validate_clusters(vertex_clusters, cluster_poly_points)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 35, | |
"text": [ | |
"true" | |
] | |
} | |
], | |
"prompt_number": 35 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now that this has been verified we proceed to connect.\n", | |
"\n", | |
"The general process to connect mean vertices to each other is:\n", | |
"\n", | |
"1. for each mean vertex:\n", | |
" 1. find the cluster of vertices it represents (from_vertices)\n", | |
" 1. for each vertex in from_vertices:\n", | |
" 1. find the vertex it is connected to (to_vertex)\n", | |
" 1. find the cluster to_vertex belongs to (to_cluster)\n", | |
" 1. add a \"vote\" for to_cluster\n", | |
" 1. tally the votes\n", | |
" 1. the to_cluster with most votes is the connected cluster\n", | |
"1. connect the clusters\n", | |
"1. validate that the connection makes sense (eg: is a [directed cycle graph](http://en.wikipedia.org/wiki/Cycle_graph))\n", | |
"\n", | |
"Below all the corresponding functions:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def find_connected_point(point, original_points)\n", | |
" original_points.each do |poly|\n", | |
" poly.each_with_index do |p,index|\n", | |
" return poly[index+1] if point === p\n", | |
" end\n", | |
" end\n", | |
" return\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 36, | |
"text": [ | |
":find_connected_point" | |
] | |
} | |
], | |
"prompt_number": 36 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def find_cluster_for_point(point, clusters)\n", | |
" clusters.each do |cluster|\n", | |
" cluster[1].each do |p|\n", | |
" return cluster[0] if point === p && cluster[0] != -1\n", | |
" end\n", | |
" end\n", | |
" return\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 37, | |
"text": [ | |
":find_cluster_for_point" | |
] | |
} | |
], | |
"prompt_number": 37 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def connect_clusters(clusters, original_points)\n", | |
" connections = {}\n", | |
" # for each cluster\n", | |
" clusters.each do |cluster|\n", | |
" # for each point in cluster\n", | |
" if cluster[0] != -1 # exclude invalid cluster\n", | |
" cluster_votes = {} # to weigh connection popularity (diff pts might be connected to diff clusters)\n", | |
" cluster[1].each do |point|\n", | |
" # find original point connected to it\n", | |
" connection = find_connected_point(point, original_points)\n", | |
" connected_cluster = find_cluster_for_point(connection, clusters)\n", | |
" # if original point belongs to another cluster\n", | |
" if connected_cluster != nil && connected_cluster != cluster[0]\n", | |
" # vote for the cluster\n", | |
" cluster_votes[connected_cluster] = 0 if cluster_votes[connected_cluster] == nil\n", | |
" cluster_votes[connected_cluster] += 1\n", | |
" end\n", | |
" end\n", | |
" connections[cluster[0]] = cluster_votes.sort_by{|k, v| v}\n", | |
" next if connections[cluster[0]].size == 0\n", | |
" connections[cluster[0]] = connections[cluster[0]].reverse[0][0]\n", | |
" end\n", | |
" end\n", | |
" return connections\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 38, | |
"text": [ | |
":connect_clusters" | |
] | |
} | |
], | |
"prompt_number": 38 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"connections = connect_clusters(vertex_clusters, cluster_poly_points)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 39, | |
"text": [ | |
"{0=>1, 1=>2, 2=>3, 3=>4, 4=>5, 5=>0}" | |
] | |
} | |
], | |
"prompt_number": 39 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"As can be seen above this is a directed cycle graph and the end result is a clean path from the first vertex to the last one.\n", | |
"\n", | |
"The fact that the points are sorted (0 to 1, 1 to 2, 2 to 3, and so on) is somewhat coincidential. Below is a basic function that checks the graph and returns a sorted list of clusters (the order we need to follow to draw the mean polygon):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def sort_connections(connections)\n", | |
" # does some simple check for non-circularity \n", | |
" sorted = []\n", | |
" seen = {}\n", | |
" as_list = connections.select{|k,v| k}\n", | |
" done = false\n", | |
" first = as_list.first[0]\n", | |
" from = first\n", | |
" while !done do\n", | |
" to = connections[from]\n", | |
" done = true if seen[to] || to == nil\n", | |
" seen[to] = true\n", | |
" from = to\n", | |
" sorted.push(to)\n", | |
" done = true if seen.size == connections.size\n", | |
" end\n", | |
" return nil if seen.size != connections.size\n", | |
" return sorted\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 40, | |
"text": [ | |
":sort_connections" | |
] | |
} | |
], | |
"prompt_number": 40 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# testing sort function\n", | |
"sort_connections(connections)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 41, | |
"text": [ | |
"[1, 2, 3, 4, 5, 0]" | |
] | |
} | |
], | |
"prompt_number": 41 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we can proceed to build our final mean polygon:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def connect_mean_poly(mean_poly, connections)\n", | |
" connected = []\n", | |
" sorted = sort_connections(connections)\n", | |
" return nil if sorted == nil\n", | |
" sorted.each do |c|\n", | |
" connected.push([mean_poly[c][0], mean_poly[c][1]])\n", | |
" end\n", | |
" # for GeoJSON, last == first\n", | |
" first = sorted[0]\n", | |
" connected.push([mean_poly[first][0], mean_poly[first][1]])\n", | |
" return connected\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 42, | |
"text": [ | |
":connect_mean_poly" | |
] | |
} | |
], | |
"prompt_number": 42 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"final_polygon = connect_mean_poly(mean_poly, connections)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 43, | |
"text": [ | |
"[[-73.98632407188416, 40.73557147326963], [-73.98622447129412, 40.73570855047738], [-73.98618267156186, 40.73569075660959], [-73.98621819913387, 40.73564040507624], [-73.98620896786451, 40.735635064416286], [-73.9862696826458, 40.73554911699269], [-73.98632407188416, 40.73557147326963]]" | |
] | |
} | |
], | |
"prompt_number": 43 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's see how all this looks like:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"plot = plot_clusters(vertex_clusters)\n", | |
"m_x = final_polygon.map { |m| m[0] }\n", | |
"m_y = final_polygon.map { |m| m[1] }\n", | |
"sc = plot.add(:scatter, m_x, m_y)\n", | |
"color = \"#ffff00\"\n", | |
"sc.color(color)\n", | |
"sc.shape('diamond')\n", | |
"# add the MEAN POLYGON\n", | |
"final_polygon.each_with_index do |c, i|\n", | |
" next if i >= final_polygon.size-1\n", | |
" from = [ final_polygon[i][0], final_polygon[i+1][0] ]\n", | |
" to = [ final_polygon[i][1], final_polygon[i+1][1] ]\n", | |
" plot.add(:line, from, to)\n", | |
"end\n", | |
"plot.show" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div id='vis-c99629fa-eb97-40d8-857d-625b06b9dca7'></div>\n", | |
"<script>\n", | |
"(function(){\n", | |
" var render = function(){\n", | |
" var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#654ba4\"},\"data\":\"4b394936-4ba7-4804-bb6f-2b331d237365\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#327493\"},\"data\":\"037d18cc-b414-4ca6-93d6-42667ffd7635\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#6bb404\"},\"data\":\"0f6efa17-762c-4d4b-9fe8-534bfacc13cf\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#b8cd89\"},\"data\":\"6b93cbc3-32b4-4748-afed-8ab856a7a9c7\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#8235a6\"},\"data\":\"f956eb40-5652-4b2e-91cc-fb94a951ae17\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#d9538d\"},\"data\":\"5f64c4c5-803f-4272-b578-04fbea9e9cc1\"},{\"type\":\"scatter\",\"options\":{\"x\":\"data0\",\"y\":\"data1\",\"color\":\"#ffff00\",\"shape\":\"diamond\"},\"data\":\"75b75ec5-1e92-48e1-b65a-d165b4adfd62\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"5ea59c9e-8d54-406b-b36d-a0b96e70c751\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"a62a3ea9-b4fa-461b-8ea5-91f9caeeb108\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"2f582a38-da7b-4c26-81d2-6de4f8bb3a48\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"7fa70935-e0e7-4b48-bf03-49f37cc243f2\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"02f1a255-dad3-4ff5-a5be-832af5ffc7d9\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"5f25db66-f393-49e6-8e06-6464178ecf05\"}],\"options\":{\"width\":300,\"height\":400,\"zoom\":true,\"xrange\":[-73.98633571101189,-73.98616953062057],\"yrange\":[40.73553787497709,40.73571995781772]}}],\"data\":{\"4b394936-4ba7-4804-bb6f-2b331d237365\":[{\"x\":-73.98627072572708,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.9862660318613,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554889117169,\"cluster\":0},{\"x\":-73.98627139627934,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.73554889117167,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.735550923560815,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554990736624,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.735550415463514,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554939926897,\"cluster\":0}],\"037d18cc-b414-4ca6-93d6-42667ffd7635\":[{\"x\":-73.98632504045963,\"y\":40.73557226364293,\"cluster\":1},{\"x\":-73.98632504045963,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632436990738,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735572771740024,\"cluster\":1},{\"x\":-73.98632571101189,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632235825062,\"y\":40.73557124744871,\"cluster\":1},{\"x\":-73.98632302880287,\"y\":40.73557023125444,\"cluster\":1}],\"0f6efa17-762c-4d4b-9fe8-534bfacc13cf\":[{\"x\":-73.98622445762157,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622512817383,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622445762157,\"y\":40.73570894162559,\"cluster\":2},{\"x\":-73.98622378706932,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2}],\"6b93cbc3-32b4-4748-afed-8ab856a7a9c7\":[{\"x\":-73.9861835539341,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569217445325,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569014206842,\"cluster\":3},{\"x\":-73.98618087172508,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569065016463,\"cluster\":3},{\"x\":-73.98618288338184,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.9861848950386,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618768252459,\"y\":40.73568957578454,\"cluster\":3},{\"x\":-73.98617953062057,\"y\":40.735689633972214,\"cluster\":3}],\"f956eb40-5652-4b2e-91cc-fb94a951ae17\":[{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563933242788,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.7356423810074,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73564034862108,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563984052446,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735638316234656,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4}],\"5f64c4c5-803f-4272-b578-04fbea9e9cc1\":[{\"x\":-73.98620970547199,\"y\":40.7356342514617,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620903491974,\"y\":40.73563577575159,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.735634251461676,\"cluster\":5},{\"x\":-73.9862110465765,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563475955834,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.73563272717173,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563577575159,\"cluster\":5}],\"75b75ec5-1e92-48e1-b65a-d165b4adfd62\":[{\"data0\":-73.98632407188416,\"data1\":40.73557147326963},{\"data0\":-73.98622447129412,\"data1\":40.73570855047738},{\"data0\":-73.98618267156186,\"data1\":40.73569075660959},{\"data0\":-73.98621819913387,\"data1\":40.73564040507624},{\"data0\":-73.98620896786451,\"data1\":40.735635064416286},{\"data0\":-73.9862696826458,\"data1\":40.73554911699269},{\"data0\":-73.98632407188416,\"data1\":40.73557147326963}],\"5ea59c9e-8d54-406b-b36d-a0b96e70c751\":[{\"data0\":-73.98632407188416,\"data1\":40.73557147326963},{\"data0\":-73.98622447129412,\"data1\":40.73570855047738}],\"a62a3ea9-b4fa-461b-8ea5-91f9caeeb108\":[{\"data0\":-73.98622447129412,\"data1\":40.73570855047738},{\"data0\":-73.98618267156186,\"data1\":40.73569075660959}],\"2f582a38-da7b-4c26-81d2-6de4f8bb3a48\":[{\"data0\":-73.98618267156186,\"data1\":40.73569075660959},{\"data0\":-73.98621819913387,\"data1\":40.73564040507624}],\"7fa70935-e0e7-4b48-bf03-49f37cc243f2\":[{\"data0\":-73.98621819913387,\"data1\":40.73564040507624},{\"data0\":-73.98620896786451,\"data1\":40.735635064416286}],\"02f1a255-dad3-4ff5-a5be-832af5ffc7d9\":[{\"data0\":-73.98620896786451,\"data1\":40.735635064416286},{\"data0\":-73.9862696826458,\"data1\":40.73554911699269}],\"5f25db66-f393-49e6-8e06-6464178ecf05\":[{\"data0\":-73.9862696826458,\"data1\":40.73554911699269},{\"data0\":-73.98632407188416,\"data1\":40.73557147326963}]},\"extension\":[]}\n", | |
" Nyaplot.core.parse(model, '#vis-c99629fa-eb97-40d8-857d-625b06b9dca7');\n", | |
" };\n", | |
" if(window['Nyaplot']==undefined){\n", | |
" window.addEventListener('load_nyaplot', render, false);\n", | |
"\treturn;\n", | |
" }\n", | |
" render();\n", | |
"})();\n", | |
"</script>\n" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 45, | |
"text": [ | |
"\"<div id='vis-c99629fa-eb97-40d8-857d-625b06b9dca7'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#654ba4\\\"},\\\"data\\\":\\\"4b394936-4ba7-4804-bb6f-2b331d237365\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#327493\\\"},\\\"data\\\":\\\"037d18cc-b414-4ca6-93d6-42667ffd7635\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#6bb404\\\"},\\\"data\\\":\\\"0f6efa17-762c-4d4b-9fe8-534bfacc13cf\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#b8cd89\\\"},\\\"data\\\":\\\"6b93cbc3-32b4-4748-afed-8ab856a7a9c7\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#8235a6\\\"},\\\"data\\\":\\\"f956eb40-5652-4b2e-91cc-fb94a951ae17\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#d9538d\\\"},\\\"data\\\":\\\"5f64c4c5-803f-4272-b578-04fbea9e9cc1\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\",\\\"color\\\":\\\"#ffff00\\\",\\\"shape\\\":\\\"diamond\\\"},\\\"data\\\":\\\"75b75ec5-1e92-48e1-b65a-d165b4adfd62\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"5ea59c9e-8d54-406b-b36d-a0b96e70c751\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"a62a3ea9-b4fa-461b-8ea5-91f9caeeb108\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"2f582a38-da7b-4c26-81d2-6de4f8bb3a48\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"7fa70935-e0e7-4b48-bf03-49f37cc243f2\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"02f1a255-dad3-4ff5-a5be-832af5ffc7d9\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"5f25db66-f393-49e6-8e06-6464178ecf05\\\"}],\\\"options\\\":{\\\"width\\\":300,\\\"height\\\":400,\\\"zoom\\\":true,\\\"xrange\\\":[-73.98633571101189,-73.98616953062057],\\\"yrange\\\":[40.73553787497709,40.73571995781772]}}],\\\"data\\\":{\\\"4b394936-4ba7-4804-bb6f-2b331d237365\\\":[{\\\"x\\\":-73.98627072572708,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.9862660318613,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554889117169,\\\"cluster\\\":0},{\\\"x\\\":-73.98627139627934,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.735550923560815,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554990736624,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.735550415463514,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554939926897,\\\"cluster\\\":0}],\\\"037d18cc-b414-4ca6-93d6-42667ffd7635\\\":[{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.73557226364293,\\\"cluster\\\":1},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632436990738,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735572771740024,\\\"cluster\\\":1},{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632235825062,\\\"y\\\":40.73557124744871,\\\"cluster\\\":1},{\\\"x\\\":-73.98632302880287,\\\"y\\\":40.73557023125444,\\\"cluster\\\":1}],\\\"0f6efa17-762c-4d4b-9fe8-534bfacc13cf\\\":[{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622512817383,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570894162559,\\\"cluster\\\":2},{\\\"x\\\":-73.98622378706932,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2}],\\\"6b93cbc3-32b4-4748-afed-8ab856a7a9c7\\\":[{\\\"x\\\":-73.9861835539341,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569217445325,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569014206842,\\\"cluster\\\":3},{\\\"x\\\":-73.98618087172508,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569065016463,\\\"cluster\\\":3},{\\\"x\\\":-73.98618288338184,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.9861848950386,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618768252459,\\\"y\\\":40.73568957578454,\\\"cluster\\\":3},{\\\"x\\\":-73.98617953062057,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3}],\\\"f956eb40-5652-4b2e-91cc-fb94a951ae17\\\":[{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563933242788,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73564034862108,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4}],\\\"5f64c4c5-803f-4272-b578-04fbea9e9cc1\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620903491974,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.735634251461676,\\\"cluster\\\":5},{\\\"x\\\":-73.9862110465765,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.73563272717173,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5}],\\\"75b75ec5-1e92-48e1-b65a-d165b4adfd62\\\":[{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963},{\\\"data0\\\":-73.98622447129412,\\\"data1\\\":40.73570855047738},{\\\"data0\\\":-73.98618267156186,\\\"data1\\\":40.73569075660959},{\\\"data0\\\":-73.98621819913387,\\\"data1\\\":40.73564040507624},{\\\"data0\\\":-73.98620896786451,\\\"data1\\\":40.735635064416286},{\\\"data0\\\":-73.9862696826458,\\\"data1\\\":40.73554911699269},{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963}],\\\"5ea59c9e-8d54-406b-b36d-a0b96e70c751\\\":[{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963},{\\\"data0\\\":-73.98622447129412,\\\"data1\\\":40.73570855047738}],\\\"a62a3ea9-b4fa-461b-8ea5-91f9caeeb108\\\":[{\\\"data0\\\":-73.98622447129412,\\\"data1\\\":40.73570855047738},{\\\"data0\\\":-73.98618267156186,\\\"data1\\\":40.73569075660959}],\\\"2f582a38-da7b-4c26-81d2-6de4f8bb3a48\\\":[{\\\"data0\\\":-73.98618267156186,\\\"data1\\\":40.73569075660959},{\\\"data0\\\":-73.98621819913387,\\\"data1\\\":40.73564040507624}],\\\"7fa70935-e0e7-4b48-bf03-49f37cc243f2\\\":[{\\\"data0\\\":-73.98621819913387,\\\"data1\\\":40.73564040507624},{\\\"data0\\\":-73.98620896786451,\\\"data1\\\":40.735635064416286}],\\\"02f1a255-dad3-4ff5-a5be-832af5ffc7d9\\\":[{\\\"data0\\\":-73.98620896786451,\\\"data1\\\":40.735635064416286},{\\\"data0\\\":-73.9862696826458,\\\"data1\\\":40.73554911699269}],\\\"5f25db66-f393-49e6-8e06-6464178ecf05\\\":[{\\\"data0\\\":-73.9862696826458,\\\"data1\\\":40.73554911699269},{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-c99629fa-eb97-40d8-857d-625b06b9dca7');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
] | |
} | |
], | |
"prompt_number": 45 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"To wrap it all up we create a single consensus function that receives a GeoJSON string and returns a list of mean polygons:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def calculate_polygonfix_consensus(geojson)\n", | |
" output = []\n", | |
" geom = parse(geojson)\n", | |
" centroids = get_all_centroids(geom)\n", | |
" centroid_clusters = cluster_centroids(centroids)\n", | |
" centroid_clusters.each do |ccluster|\n", | |
" cluster = ccluster[1] # only the set of latlons\n", | |
" sub_geom = get_polys_for_centroid_cluster(cluster, centroids, geom)\n", | |
" next if sub_geom.size == 0\n", | |
" original_points = get_all_poly_points(sub_geom)\n", | |
" next if original_points == nil\n", | |
" clusters = cluster_points(original_points)\n", | |
" next if !validate_clusters(clusters, original_points)\n", | |
" mean_poly = get_mean_poly(clusters)\n", | |
" next if mean_poly == {}\n", | |
" connections = connect_clusters(clusters, original_points)\n", | |
" next if connections == {}\n", | |
" poly = connect_mean_poly(mean_poly, connections)\n", | |
" next if poly == nil || poly.count == 0\n", | |
" output.push(poly)\n", | |
" end\n", | |
" return output\n", | |
"end" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 47, | |
"text": [ | |
":calculate_polygonfix_consensus" | |
] | |
} | |
], | |
"prompt_number": 47 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"consensus = calculate_polygonfix_consensus(geomstr)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 48, | |
"text": [ | |
"[[[-73.98632407188416, 40.73557147326963], [-73.98622447129412, 40.73570855047738], [-73.98618267156186, 40.73569075660959], [-73.98621819913387, 40.73564040507624], [-73.98620896786451, 40.735635064416286], [-73.9862696826458, 40.73554911699269], [-73.98632407188416, 40.73557147326963]]]" | |
] | |
} | |
], | |
"prompt_number": 48 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The GeoJSON of all this might look something like:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"geo_json = {:type => \"FeatureCollection\", :features => consensus.map { |f| {:type => \"Feature\", :properties => { :id => 1 }, :geometry => { :type => \"Polygon\", :coordinates =>[f] } } } }.to_json\n", | |
"puts geo_json" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"{\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"properties\":{\"id\":1},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98632407188416,40.73557147326963],[-73.98622447129412,40.73570855047738],[-73.98618267156186,40.73569075660959],[-73.98621819913387,40.73564040507624],[-73.98620896786451,40.735635064416286],[-73.9862696826458,40.73554911699269],[-73.98632407188416,40.73557147326963]]]}}]}\n" | |
] | |
} | |
], | |
"prompt_number": 64 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now let's plots the resulting GeoJSON on the original map (purple):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"IRuby.html '<iframe src=\"http://jsfiddle.net/mgiraldo/m4XeU/1/embedded/result/\" width=500 height=400></iframe>'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<iframe src=\"http://jsfiddle.net/mgiraldo/m4XeU/1/embedded/result/\" width=500 height=400></iframe>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 65, | |
"text": [ | |
"\"<iframe src=\\\"http://jsfiddle.net/mgiraldo/m4XeU/1/embedded/result/\\\" width=500 height=400></iframe>\"" | |
] | |
} | |
], | |
"prompt_number": 65 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Voil\u00e0! The mean polygon looks good!" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Conclusion\n", | |
"\n", | |
"This is a first step towards finding geometric consensus from a list of user contributions to a given starting geometry and a map. It is a work in progress and hopefully other ideas can be added to improve this algorithm.\n", | |
"\n", | |
"This code is part of NYPL Labs' [Building Inspector](http://buildinginspector.nypl.org/). Explore and fork the [GitHub repository](https://github.com/NYPL/building-inspector).\n", | |
"\n", | |
"This notebook was created by [Mauricio Giraldo Arteaga](https://twitter.com/mgiraldo)." | |
] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
http://nbviewer.ipython.org/urls/gist.githubusercontent.com/domitry/e087d69315075bebe3b1/raw/5110b04d5591c91b2bc269ed41d647bdec682f00/polygonfix%20writeup.ipynb