Last active
February 14, 2017 18:09
-
-
Save ljwolf/b960c198742eb26c335d9e32bdc7b76c to your computer and use it in GitHub Desktop.
Overview of Labelled Array Development work
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Towards a Tabular Data Model for PySAL\n", | |
"\n", | |
"The Google Summer of Code project to add tabular data model to PySAL has achieved many of its original goals. This project primarily involved writing new modules and extending classes to enable common `PySAL` classes to work with `pandas` dataframes. The target of the dataframe was, as stated in the initial specification, a `pandas` dataframe with a column full of `PySAL` geometric objects. In addition, core functionality in the exploratory spatial data analysis and spatial weights construction modules was extended to use these tables. \n", | |
"\n", | |
"In developing this new functionality, new issues were discovered in the code base, which required resolution. Since the subject of the proposal touched every part of the library in addition to building the tools alongside of the library, the changes are far-ranging throughout the library. This notebook provides an explanation of the issues, extension of functionality, and remaining work that was not addressed in the context of the Summer of Code work. " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Contents\n", | |
"\n", | |
"## Overview Sections\n", | |
"### [Pandas Read/Write](#Direct-to-pandas-read/write)\n", | |
"### [New Spatial Weights Functionality](#Weights-in-memory)\n", | |
"### [Spatial Statistics on Tables](#Exploratory-Statistics)\n", | |
"### [Solidifying Soft Dependency Practices](#Exploratory-Statistics)\n", | |
"### [Ancillary Work Encountered During Project](#Closed-Issues)\n", | |
"### [Remaining Work & Next Steps](#Open-Issues)\n", | |
"\n", | |
"\n", | |
"# Github Links\n", | |
"## [All Commits](https://github.com/pysal/pysal/commits/dev?author=ljwolf)\n", | |
"## [Closed Issues](https://github.com/pysal/pysal/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20label%3Aljwolf_GSOC%20) \n", | |
"## [Pull Requests](https://github.com/pysal/pysal/pulls?utf8=%E2%9C%93&q=is%3Apr%20label%3Aljwolf_GSOC%20)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import pysal as ps\n", | |
"import numpy as np\n", | |
"import pandas as pd" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The main benefit/advantage of this work that users of the library will see are the following interaction methods. " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Direct-to-pandas read/write" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Before, our input-output API was heavily oriented around numpy arrays. In a sense, most of the interface code for the IO system was built around providing methods to interact with file handlers in a way that resembled the default python `open` statement. \n", | |
"\n", | |
"Under this focus on emulating `stdlib` behavior, users interacted with file handles by extracting sets of columns from a tabular dataset, reading individual shapes from a shapefile, or reading entire sparse matrix description files at once." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"shape_path = ps.examples.get_path('south.shp')\n", | |
"table_path = ps.examples.get_path('south.dbf')\n", | |
"weights_path = ps.examples.get_path('columbus.gal')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"shape_handler = ps.open(shape_path)\n", | |
"table_handler = ps.open(table_path)\n", | |
"weights_handler = ps.open(weights_path)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"These had some methods that mimicked the standard python `open` statement, and some methods designed explicitly to make it easy to work with tabular data: " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"shape_handler.seek(5) #now on 6th shape, rather than the 1st shape" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<pysal.cg.shapes.Polygon at 0x7f6425db0f50>" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"shape_handler.get(9) #reads the 9th shape" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<pysal.weights.weights.W at 0x7f6425dc0050>" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weights_handler.read()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Specifically, the tabular handlers store the metadata about the file being opened, to facilitate working with the table while keeping it out of memory." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[u'FIPSNO', u'NAME', u'STATE_NAME', u'STATE_FIPS', u'CNTY_FIPS']" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"table_handler.header[0:5]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([[u'12.604551644', u'54029', u'Hancock'],\n", | |
" [u'11.24229306', u'54009', u'Brooke'],\n", | |
" [u'17.574021012', u'54069', u'Ohio'],\n", | |
" [u'13.564158661', u'54051', u'Marshall'],\n", | |
" [u'16.380902823', u'10003', u'New Castle']], \n", | |
" dtype='<U32')" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"table_handler.by_col_array('FH90', 'FIPSNO', 'NAME')[0:5]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[(u'N', 16, 0), (u'C', 32, 0), (u'C', 25, 0), (u'C', 2, 0), (u'C', 3, 0)]" | |
] | |
}, | |
"execution_count": 9, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"table_handler.field_spec[0:5]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[10003, u'New Castle', u'Delaware', u'10', u'003']" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"table_handler.read_record(4)[0:5]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"But, using `pandas` provides many more tools for us. Thus, a few methods were added to IO objects, and a separate `read_files` call added." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>FIPSNO</th>\n", | |
" <th>NAME</th>\n", | |
" <th>STATE_NAME</th>\n", | |
" <th>STATE_FIPS</th>\n", | |
" <th>CNTY_FIPS</th>\n", | |
" <th>FIPS</th>\n", | |
" <th>STFIPS</th>\n", | |
" <th>COFIPS</th>\n", | |
" <th>SOUTH</th>\n", | |
" <th>HR60</th>\n", | |
" <th>...</th>\n", | |
" <th>BLK90</th>\n", | |
" <th>GI59</th>\n", | |
" <th>GI69</th>\n", | |
" <th>GI79</th>\n", | |
" <th>GI89</th>\n", | |
" <th>FH60</th>\n", | |
" <th>FH70</th>\n", | |
" <th>FH80</th>\n", | |
" <th>FH90</th>\n", | |
" <th>geometry</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>54029</td>\n", | |
" <td>Hancock</td>\n", | |
" <td>West Virginia</td>\n", | |
" <td>54</td>\n", | |
" <td>029</td>\n", | |
" <td>54029</td>\n", | |
" <td>54</td>\n", | |
" <td>29</td>\n", | |
" <td>1</td>\n", | |
" <td>1.682864</td>\n", | |
" <td>...</td>\n", | |
" <td>2.557262</td>\n", | |
" <td>0.223645</td>\n", | |
" <td>0.295377</td>\n", | |
" <td>0.332251</td>\n", | |
" <td>0.363934</td>\n", | |
" <td>9.981297</td>\n", | |
" <td>7.8</td>\n", | |
" <td>9.785797</td>\n", | |
" <td>12.604552</td>\n", | |
" <td><pysal.cg.shapes.Polygon object at 0x7f642612d...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>54009</td>\n", | |
" <td>Brooke</td>\n", | |
" <td>West Virginia</td>\n", | |
" <td>54</td>\n", | |
" <td>009</td>\n", | |
" <td>54009</td>\n", | |
" <td>54</td>\n", | |
" <td>9</td>\n", | |
" <td>1</td>\n", | |
" <td>4.607233</td>\n", | |
" <td>...</td>\n", | |
" <td>0.748370</td>\n", | |
" <td>0.220407</td>\n", | |
" <td>0.318453</td>\n", | |
" <td>0.314165</td>\n", | |
" <td>0.350569</td>\n", | |
" <td>10.929337</td>\n", | |
" <td>8.0</td>\n", | |
" <td>10.214990</td>\n", | |
" <td>11.242293</td>\n", | |
" <td><pysal.cg.shapes.Polygon object at 0x7f642612d...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>54069</td>\n", | |
" <td>Ohio</td>\n", | |
" <td>West Virginia</td>\n", | |
" <td>54</td>\n", | |
" <td>069</td>\n", | |
" <td>54069</td>\n", | |
" <td>54</td>\n", | |
" <td>69</td>\n", | |
" <td>1</td>\n", | |
" <td>0.974132</td>\n", | |
" <td>...</td>\n", | |
" <td>3.310334</td>\n", | |
" <td>0.272398</td>\n", | |
" <td>0.358454</td>\n", | |
" <td>0.376963</td>\n", | |
" <td>0.390534</td>\n", | |
" <td>15.621643</td>\n", | |
" <td>12.9</td>\n", | |
" <td>14.716681</td>\n", | |
" <td>17.574021</td>\n", | |
" <td><pysal.cg.shapes.Polygon object at 0x7f642612d...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>54051</td>\n", | |
" <td>Marshall</td>\n", | |
" <td>West Virginia</td>\n", | |
" <td>54</td>\n", | |
" <td>051</td>\n", | |
" <td>54051</td>\n", | |
" <td>54</td>\n", | |
" <td>51</td>\n", | |
" <td>1</td>\n", | |
" <td>0.876248</td>\n", | |
" <td>...</td>\n", | |
" <td>0.546097</td>\n", | |
" <td>0.227647</td>\n", | |
" <td>0.319580</td>\n", | |
" <td>0.320953</td>\n", | |
" <td>0.377346</td>\n", | |
" <td>11.962834</td>\n", | |
" <td>8.8</td>\n", | |
" <td>8.803253</td>\n", | |
" <td>13.564159</td>\n", | |
" <td><pysal.cg.shapes.Polygon object at 0x7f642612d...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>10003</td>\n", | |
" <td>New Castle</td>\n", | |
" <td>Delaware</td>\n", | |
" <td>10</td>\n", | |
" <td>003</td>\n", | |
" <td>10003</td>\n", | |
" <td>10</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>4.228385</td>\n", | |
" <td>...</td>\n", | |
" <td>16.480294</td>\n", | |
" <td>0.256106</td>\n", | |
" <td>0.329678</td>\n", | |
" <td>0.365830</td>\n", | |
" <td>0.332703</td>\n", | |
" <td>12.035714</td>\n", | |
" <td>10.7</td>\n", | |
" <td>15.169480</td>\n", | |
" <td>16.380903</td>\n", | |
" <td><pysal.cg.shapes.Polygon object at 0x7f642612d...</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>5 rows × 70 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" FIPSNO NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS STFIPS \\\n", | |
"0 54029 Hancock West Virginia 54 029 54029 54 \n", | |
"1 54009 Brooke West Virginia 54 009 54009 54 \n", | |
"2 54069 Ohio West Virginia 54 069 54069 54 \n", | |
"3 54051 Marshall West Virginia 54 051 54051 54 \n", | |
"4 10003 New Castle Delaware 10 003 10003 10 \n", | |
"\n", | |
" COFIPS SOUTH HR60 ... \\\n", | |
"0 29 1 1.682864 ... \n", | |
"1 9 1 4.607233 ... \n", | |
"2 69 1 0.974132 ... \n", | |
"3 51 1 0.876248 ... \n", | |
"4 3 1 4.228385 ... \n", | |
"\n", | |
" BLK90 GI59 GI69 GI79 GI89 FH60 FH70 \\\n", | |
"0 2.557262 0.223645 0.295377 0.332251 0.363934 9.981297 7.8 \n", | |
"1 0.748370 0.220407 0.318453 0.314165 0.350569 10.929337 8.0 \n", | |
"2 3.310334 0.272398 0.358454 0.376963 0.390534 15.621643 12.9 \n", | |
"3 0.546097 0.227647 0.319580 0.320953 0.377346 11.962834 8.8 \n", | |
"4 16.480294 0.256106 0.329678 0.365830 0.332703 12.035714 10.7 \n", | |
"\n", | |
" FH80 FH90 geometry \n", | |
"0 9.785797 12.604552 <pysal.cg.shapes.Polygon object at 0x7f642612d... \n", | |
"1 10.214990 11.242293 <pysal.cg.shapes.Polygon object at 0x7f642612d... \n", | |
"2 14.716681 17.574021 <pysal.cg.shapes.Polygon object at 0x7f642612d... \n", | |
"3 8.803253 13.564159 <pysal.cg.shapes.Polygon object at 0x7f642612d... \n", | |
"4 15.169480 16.380903 <pysal.cg.shapes.Polygon object at 0x7f642612d... \n", | |
"\n", | |
"[5 rows x 70 columns]" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"table_handler.to_df().head()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"table = ps.pdio.read_files(table_path)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>FIPSNO</th>\n", | |
" <th>NAME</th>\n", | |
" <th>STATE_NAME</th>\n", | |
" <th>STATE_FIPS</th>\n", | |
" <th>CNTY_FIPS</th>\n", | |
" <th>FIPS</th>\n", | |
" <th>STFIPS</th>\n", | |
" <th>COFIPS</th>\n", | |
" <th>SOUTH</th>\n", | |
" <th>HR60</th>\n", | |
" <th>...</th>\n", | |
" <th>BLK90</th>\n", | |
" <th>GI59</th>\n", | |
" <th>GI69</th>\n", | |
" <th>GI79</th>\n", | |
" <th>GI89</th>\n", | |
" <th>FH60</th>\n", | |
" <th>FH70</th>\n", | |
" <th>FH80</th>\n", | |
" <th>FH90</th>\n", | |
" <th>geometry</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>54029</td>\n", | |
" <td>Hancock</td>\n", | |
" <td>West Virginia</td>\n", | |
" <td>54</td>\n", | |
" <td>029</td>\n", | |
" <td>54029</td>\n", | |
" <td>54</td>\n", | |
" <td>29</td>\n", | |
" <td>1</td>\n", | |
" <td>1.682864</td>\n", | |
" <td>...</td>\n", | |
" <td>2.557262</td>\n", | |
" <td>0.223645</td>\n", | |
" <td>0.295377</td>\n", | |
" <td>0.332251</td>\n", | |
" <td>0.363934</td>\n", | |
" <td>9.981297</td>\n", | |
" <td>7.8</td>\n", | |
" <td>9.785797</td>\n", | |
" <td>12.604552</td>\n", | |
" <td><pysal.cg.shapes.Polygon object at 0x7f64257b7...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>54009</td>\n", | |
" <td>Brooke</td>\n", | |
" <td>West Virginia</td>\n", | |
" <td>54</td>\n", | |
" <td>009</td>\n", | |
" <td>54009</td>\n", | |
" <td>54</td>\n", | |
" <td>9</td>\n", | |
" <td>1</td>\n", | |
" <td>4.607233</td>\n", | |
" <td>...</td>\n", | |
" <td>0.748370</td>\n", | |
" <td>0.220407</td>\n", | |
" <td>0.318453</td>\n", | |
" <td>0.314165</td>\n", | |
" <td>0.350569</td>\n", | |
" <td>10.929337</td>\n", | |
" <td>8.0</td>\n", | |
" <td>10.214990</td>\n", | |
" <td>11.242293</td>\n", | |
" <td><pysal.cg.shapes.Polygon object at 0x7f64257b7...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>54069</td>\n", | |
" <td>Ohio</td>\n", | |
" <td>West Virginia</td>\n", | |
" <td>54</td>\n", | |
" <td>069</td>\n", | |
" <td>54069</td>\n", | |
" <td>54</td>\n", | |
" <td>69</td>\n", | |
" <td>1</td>\n", | |
" <td>0.974132</td>\n", | |
" <td>...</td>\n", | |
" <td>3.310334</td>\n", | |
" <td>0.272398</td>\n", | |
" <td>0.358454</td>\n", | |
" <td>0.376963</td>\n", | |
" <td>0.390534</td>\n", | |
" <td>15.621643</td>\n", | |
" <td>12.9</td>\n", | |
" <td>14.716681</td>\n", | |
" <td>17.574021</td>\n", | |
" <td><pysal.cg.shapes.Polygon object at 0x7f64257b7...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>54051</td>\n", | |
" <td>Marshall</td>\n", | |
" <td>West Virginia</td>\n", | |
" <td>54</td>\n", | |
" <td>051</td>\n", | |
" <td>54051</td>\n", | |
" <td>54</td>\n", | |
" <td>51</td>\n", | |
" <td>1</td>\n", | |
" <td>0.876248</td>\n", | |
" <td>...</td>\n", | |
" <td>0.546097</td>\n", | |
" <td>0.227647</td>\n", | |
" <td>0.319580</td>\n", | |
" <td>0.320953</td>\n", | |
" <td>0.377346</td>\n", | |
" <td>11.962834</td>\n", | |
" <td>8.8</td>\n", | |
" <td>8.803253</td>\n", | |
" <td>13.564159</td>\n", | |
" <td><pysal.cg.shapes.Polygon object at 0x7f64257b7...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>10003</td>\n", | |
" <td>New Castle</td>\n", | |
" <td>Delaware</td>\n", | |
" <td>10</td>\n", | |
" <td>003</td>\n", | |
" <td>10003</td>\n", | |
" <td>10</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>4.228385</td>\n", | |
" <td>...</td>\n", | |
" <td>16.480294</td>\n", | |
" <td>0.256106</td>\n", | |
" <td>0.329678</td>\n", | |
" <td>0.365830</td>\n", | |
" <td>0.332703</td>\n", | |
" <td>12.035714</td>\n", | |
" <td>10.7</td>\n", | |
" <td>15.169480</td>\n", | |
" <td>16.380903</td>\n", | |
" <td><pysal.cg.shapes.Polygon object at 0x7f64257b7...</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>5 rows × 70 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" FIPSNO NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS STFIPS \\\n", | |
"0 54029 Hancock West Virginia 54 029 54029 54 \n", | |
"1 54009 Brooke West Virginia 54 009 54009 54 \n", | |
"2 54069 Ohio West Virginia 54 069 54069 54 \n", | |
"3 54051 Marshall West Virginia 54 051 54051 54 \n", | |
"4 10003 New Castle Delaware 10 003 10003 10 \n", | |
"\n", | |
" COFIPS SOUTH HR60 ... \\\n", | |
"0 29 1 1.682864 ... \n", | |
"1 9 1 4.607233 ... \n", | |
"2 69 1 0.974132 ... \n", | |
"3 51 1 0.876248 ... \n", | |
"4 3 1 4.228385 ... \n", | |
"\n", | |
" BLK90 GI59 GI69 GI79 GI89 FH60 FH70 \\\n", | |
"0 2.557262 0.223645 0.295377 0.332251 0.363934 9.981297 7.8 \n", | |
"1 0.748370 0.220407 0.318453 0.314165 0.350569 10.929337 8.0 \n", | |
"2 3.310334 0.272398 0.358454 0.376963 0.390534 15.621643 12.9 \n", | |
"3 0.546097 0.227647 0.319580 0.320953 0.377346 11.962834 8.8 \n", | |
"4 16.480294 0.256106 0.329678 0.365830 0.332703 12.035714 10.7 \n", | |
"\n", | |
" FH80 FH90 geometry \n", | |
"0 9.785797 12.604552 <pysal.cg.shapes.Polygon object at 0x7f64257b7... \n", | |
"1 10.214990 11.242293 <pysal.cg.shapes.Polygon object at 0x7f64257b7... \n", | |
"2 14.716681 17.574021 <pysal.cg.shapes.Polygon object at 0x7f64257b7... \n", | |
"3 8.803253 13.564159 <pysal.cg.shapes.Polygon object at 0x7f64257b7... \n", | |
"4 15.169480 16.380903 <pysal.cg.shapes.Polygon object at 0x7f64257b7... \n", | |
"\n", | |
"[5 rows x 70 columns]" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"table.head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In a manner similar to geopandas, there is a geometry column stored in the table:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0 <pysal.cg.shapes.Polygon object at 0x7f64257b7...\n", | |
"1 <pysal.cg.shapes.Polygon object at 0x7f64257b7...\n", | |
"2 <pysal.cg.shapes.Polygon object at 0x7f64257b7...\n", | |
"3 <pysal.cg.shapes.Polygon object at 0x7f64257b7...\n", | |
"4 <pysal.cg.shapes.Polygon object at 0x7f64257b7...\n", | |
"Name: geometry, dtype: object" | |
] | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"table.geometry.head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"But, unlike geopandas, these tables are flat pandas tables, not subclasses. Thus, the `geometry` column is just like any other pandas column that stores rich objects. " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Weights in memory" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"One thing that this makes more simple is the construction of weights from an arbitrary set of geometries. Before, the library focused primarily on providing spatial weighting functions in `weights.user`, and most of these targeted either arrays of points or shapefiles. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"['DistanceBand',\n", | |
" 'Kernel',\n", | |
" '__all__',\n", | |
" '__author__',\n", | |
" '__builtins__',\n", | |
" '__doc__',\n", | |
" '__file__',\n", | |
" '__name__',\n", | |
" '__package__',\n", | |
" '_test',\n", | |
" 'adaptive_kernelW',\n", | |
" 'adaptive_kernelW_from_shapefile',\n", | |
" 'buildContiguity',\n", | |
" 'build_lattice_shapefile',\n", | |
" 'get_ids',\n", | |
" 'get_points_array_from_shapefile',\n", | |
" 'kernelW',\n", | |
" 'kernelW_from_shapefile',\n", | |
" 'knnW',\n", | |
" 'knnW_from_array',\n", | |
" 'knnW_from_shapefile',\n", | |
" 'min_threshold_dist_from_shapefile',\n", | |
" 'min_threshold_distance',\n", | |
" 'np',\n", | |
" 'pysal',\n", | |
" 'queen_from_shapefile',\n", | |
" 'rook_from_shapefile',\n", | |
" 'spw_from_gal',\n", | |
" 'threshold_binaryW_from_array',\n", | |
" 'threshold_binaryW_from_shapefile',\n", | |
" 'threshold_continuousW_from_array',\n", | |
" 'threshold_continuousW_from_shapefile']" | |
] | |
}, | |
"execution_count": 15, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dir(ps.weights.user)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Instead of these, I implemented this style of interface as a `classmethod` on the standard weights classes:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"['from_WSP', 'from_dataframe', 'from_file', 'from_iterable', 'from_shapefile']" | |
] | |
}, | |
"execution_count": 16, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"[m for m in dir(ps.weights.Rook) if m.startswith('from')]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"By default, `ps.weights.CLASSNAME` constructs a spatial weights object when passed a set of geometric primitives in the the basic `PySAL` computational geometry classes. all of the classmethods, then, are methods to cast input to a form that that can be passed to the class constructor. For example, the `from_dataframe` method peels off a geometry column of a dataframe (argument: `geom_col`, default: `geometry`) and then calls `class.from_iterable` on the resulting collection. `from_iterable`, then will cast the shapes to `PySAL` shape objects (if needed), and then send them to the class constructor." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<pysal.weights.Contiguity.Queen at 0x7f6425dc0750>" | |
] | |
}, | |
"execution_count": 17, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ps.weights.Queen.from_dataframe(table)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This means that geopandas dataframes can be used as well. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import geopandas as gpd" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"gdf = gpd.read_file(table_path)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<pysal.weights.Contiguity.Queen at 0x7f6425cfbf90>" | |
] | |
}, | |
"execution_count": 20, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ps.weights.Queen.from_dataframe(gdf)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This makes it much easier to work with spatial weights, since it allows the user to do whatever data pipelining needs to be done directly in (geo)pandas. For example, generating a state-specific weights matrix from county-level data, we can use the tabular GIS operations implemented in the `geotable` module: " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"from pysal.contrib.geotable import ops as GIS" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"states = GIS.tabular.dissolve(table, 'STATE_FIPS')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<pysal.weights.Contiguity.Queen at 0x7f64252c6b50>" | |
] | |
}, | |
"execution_count": 23, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ps.weights.Queen.from_iterable(states)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"*Every* weighting class in PySAL now has these methods, so this involves new methods over:\n", | |
"- DistanceBand\n", | |
"- Kernel\n", | |
"- Queen (new class)\n", | |
"- Rook (new class)\n", | |
"- KNN (new class)\n", | |
"- Weights\n", | |
"- WSP" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Spatial Operations" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In addition, the geotable module was added. This is for ancillary functions to make it easier to work with spatial data. Above, we used the `ops` module, which contains tabular and atomic geographic information system operations that apply to tables. \n", | |
"\n", | |
"These are a mix of `pysal` functions and wrappers around `shapely` that have been extended to apply to tables, or wrappers around `geopandas`. In addition, since the provider of a generic spatial function (like `centroid`) might conflict between various sources, a `config` module was added to allow the user to enforce specific sourcing." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"from pysal.contrib import geotable as gt\n", | |
"from pysal.contrib.geotable import config as gtconfig" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"['clip',\n", | |
" 'difference',\n", | |
" 'dissolve',\n", | |
" 'erase',\n", | |
" 'intersection',\n", | |
" 'join',\n", | |
" 'spatial_join',\n", | |
" 'spatial_overlay',\n", | |
" 'symmetric_difference',\n", | |
" 'to_df',\n", | |
" 'to_gdf',\n", | |
" 'union']" | |
] | |
}, | |
"execution_count": 25, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"[x for x in dir(GIS.tabular) if not x.startswith('_')]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"['almost_equals',\n", | |
" 'area',\n", | |
" 'bbox',\n", | |
" 'boundary',\n", | |
" 'bounding_box',\n", | |
" 'bounds',\n", | |
" 'buffer',\n", | |
" 'cascaded_intersection',\n", | |
" 'cascaded_union',\n", | |
" 'centroid',\n", | |
" 'contains',\n", | |
" 'convex_hull',\n", | |
" 'crosses',\n", | |
" 'difference',\n", | |
" 'disjoint',\n", | |
" 'distance',\n", | |
" 'envelope',\n", | |
" 'equals',\n", | |
" 'equals_exact',\n", | |
" 'get_attr',\n", | |
" 'has_z',\n", | |
" 'holes',\n", | |
" 'interpolate',\n", | |
" 'intersection',\n", | |
" 'intersects',\n", | |
" 'is_empty',\n", | |
" 'is_ring',\n", | |
" 'is_simple',\n", | |
" 'is_valid',\n", | |
" 'k',\n", | |
" 'len',\n", | |
" 'length',\n", | |
" 'overlaps',\n", | |
" 'parts',\n", | |
" 'perimeter',\n", | |
" 'project',\n", | |
" 'relate',\n", | |
" 'representative_point',\n", | |
" 'segments',\n", | |
" 'simplify',\n", | |
" 'symmetric_difference',\n", | |
" 'to_wkb',\n", | |
" 'to_wkt',\n", | |
" 'touches',\n", | |
" 'unary_union',\n", | |
" 'union',\n", | |
" 'vertices',\n", | |
" 'warn',\n", | |
" 'within']" | |
] | |
}, | |
"execution_count": 26, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"[x for x in dir(GIS.atomic) if not x.startswith('_')]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Exploratory Statistics" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The exploratory statistics module has new functionality to work with these new tables. First, series can be used in any of the local statistical functions, smoothers, or map classifiers:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
" Quantiles \n", | |
" \n", | |
"Lower Upper Count\n", | |
"=========================================\n", | |
" x[i] <= 4.180 283\n", | |
" 4.180 < x[i] <= 6.904 282\n", | |
" 6.904 < x[i] <= 9.783 282\n", | |
" 9.783 < x[i] <= 14.278 282\n", | |
"14.278 < x[i] <= 64.261 283" | |
] | |
}, | |
"execution_count": 27, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ps.esda.mapclassify.Quantiles(table.HR90)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"e = np.random.randint(1,20,size=(100,1))\n", | |
"b = np.asarray([np.random.randint(x, (x+1)**2) for x in e]).reshape(-1,1)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"rates =pd.DataFrame(np.hstack((e,b)), columns=['event', 'popn'])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([[ 0.08243715],\n", | |
" [ 0.18516543],\n", | |
" [ 0.19699649],\n", | |
" [ 0.16006168],\n", | |
" [ 0.08194574]])" | |
] | |
}, | |
"execution_count": 30, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ps.esda.smoothing.Empirical_Bayes(rates.event, rates.popn).r[0:5]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([ 0.59122198, 0.50869881, 0.40200111, ..., 0.42596763,\n", | |
" 0.14877105, 0.6293534 ])" | |
] | |
}, | |
"execution_count": 31, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ps.esda.moran.Moran_Local(table.HR80, \n", | |
" ps.weights.Queen.from_dataframe(table)).Is" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In addition to this, smoothers and statistics now have a `by_col` method, that allows for the bulk application of the relevant technique to a dataframe, and the extraction of the results in bulk:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"GIcols = [col for col in table.columns if col.startswith('GI')]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"ps.esda.moran.Moran_Local.by_col(table, GIcols,\n", | |
" ps.weights.Queen.from_dataframe(table),\n", | |
" inplace=True)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>GI59_moran_local</th>\n", | |
" <th>GI69_moran_local</th>\n", | |
" <th>GI79_moran_local</th>\n", | |
" <th>GI89_moran_local</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>7.603954</td>\n", | |
" <td>4.879720</td>\n", | |
" <td>3.273131</td>\n", | |
" <td>1.008983</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>6.062181</td>\n", | |
" <td>3.288675</td>\n", | |
" <td>1.869984</td>\n", | |
" <td>0.547154</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>4.387870</td>\n", | |
" <td>1.684638</td>\n", | |
" <td>0.444277</td>\n", | |
" <td>0.058397</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>3.399893</td>\n", | |
" <td>1.521413</td>\n", | |
" <td>1.194298</td>\n", | |
" <td>-0.015205</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>2.680893</td>\n", | |
" <td>1.830299</td>\n", | |
" <td>0.396829</td>\n", | |
" <td>1.822810</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" GI59_moran_local GI69_moran_local GI79_moran_local GI89_moran_local\n", | |
"0 7.603954 4.879720 3.273131 1.008983\n", | |
"1 6.062181 3.288675 1.869984 0.547154\n", | |
"2 4.387870 1.684638 0.444277 0.058397\n", | |
"3 3.399893 1.521413 1.194298 -0.015205\n", | |
"4 2.680893 1.830299 0.396829 1.822810" | |
] | |
}, | |
"execution_count": 34, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"table[[col for col in table.columns if col.endswith('local')]].head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In addition, a new method for interacting with classifier functions was devised to make it easy to use with dataframes. This method, the `make` method, works by partially applying the classifier over a set of configuration arguments. This results in a function that takes data and returns the bins of the data. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 35, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<function pysal.esda.mapclassify.classifier>" | |
] | |
}, | |
"execution_count": 35, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ps.Fisher_Jenks.make(k=9)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This makes it simple to do the following:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 36, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>GI59</th>\n", | |
" <th>GI69</th>\n", | |
" <th>GI79</th>\n", | |
" <th>GI89</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>3</td>\n", | |
" <td>4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>3</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" GI59 GI69 GI79 GI89\n", | |
"0 0 0 0 2\n", | |
"1 0 0 0 1\n", | |
"2 1 2 3 4\n", | |
"3 0 0 0 3\n", | |
"4 1 1 2 1" | |
] | |
}, | |
"execution_count": 36, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"table[GIcols].apply(ps.Fisher_Jenks.make(k=9)).head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This also enables updating/rolling classifiers. These are classifiers that append *new* data to data that's already been viewed, and return the classifications of the new data in the distribution seen so far:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 37, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<function pysal.esda.mapclassify.classifier>" | |
] | |
}, | |
"execution_count": 37, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ps.Fisher_Jenks.make(k=9, rolling=True)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 38, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>GI59</th>\n", | |
" <th>GI69</th>\n", | |
" <th>GI79</th>\n", | |
" <th>GI89</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>0</td>\n", | |
" <td>2</td>\n", | |
" <td>3</td>\n", | |
" <td>4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>0</td>\n", | |
" <td>3</td>\n", | |
" <td>2</td>\n", | |
" <td>3</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>1</td>\n", | |
" <td>4</td>\n", | |
" <td>5</td>\n", | |
" <td>5</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>0</td>\n", | |
" <td>3</td>\n", | |
" <td>2</td>\n", | |
" <td>4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>1</td>\n", | |
" <td>3</td>\n", | |
" <td>4</td>\n", | |
" <td>2</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" GI59 GI69 GI79 GI89\n", | |
"0 0 2 3 4\n", | |
"1 0 3 2 3\n", | |
"2 1 4 5 5\n", | |
"3 0 3 2 4\n", | |
"4 1 3 4 2" | |
] | |
}, | |
"execution_count": 38, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"table[GIcols].apply(ps.Fisher_Jenks.make(k=9, rolling=True)).head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Solidifying the Soft Dependency Logic" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In this, a new set of [operating guidelines](https://github.com/pysal/pysal/wiki/Using-soft-dependencies) was added to the project for modules or functionality that relies on dependencies that are more advanced than our default numpy/scipy dependencies. In addition, tools for continuous integration testing and safe code interoperability was added to the project to facilitate PySAL being used in two environments. \n", | |
"\n", | |
"The first environment is in the case that a user has most of the PyData ecosystem installed. That is, code in this environment, called the `PySAL Plus` environment, can use:\n", | |
"- Matplotlib/seaborn\n", | |
"- pandas\n", | |
"- dask\n", | |
"- numba\n", | |
"The work here both supplements ongoing efforts for [visualization](https://github.com/pysal/pysal/pull/844) and [performance enhancements](https://github.com/pysal/pysal/pull/829), as well as providing a continuous integration environment in which the GSOC code can be tested. " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Ancillary Work " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### [Closed Issues](https://github.com/pysal/pysal/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20label%3Aljwolf_GSOC%20)\n", | |
"\n", | |
"These are issues that came up while conducting the work:\n", | |
"\n", | |
"#### Spatial filtering grid definition [#826](https://github.com/pysal/pysal/issues/826)\n", | |
"The spatial filtering for rates did not return the correct gridding due to a float division error. Troubleshooting this failure was difficult because the unittest was wrong. \n", | |
"\n", | |
"\n", | |
"#### Headbanging Median Rate ignores edge correction [#825](https://github.com/pysal/pysal/issues/825)\n", | |
"The headbanging median rate did not correctly trigger into its edge correction branch, meaning that any request for edge correction by the user failed. After fixing this, the branch of the edge correction code did not work, and had to be fixed. \n", | |
"\n", | |
"#### Direct Age Standardization fails for empty regions [#824](https://github.com/pysal/pysal/issues/824)\n", | |
"Another case of incomplete test coverage. One of the conditional branches did not work because it did not call the correct function from `scipy`'s distributions, but would only be encountered in corner cases.\n", | |
"\n", | |
"#### Exception TypeError in `geoda_txt.py` [#816](https://github.com/pysal/pysal/issues/816)\n", | |
"Encountered while ensuring test completeness for the tabular reading functions, this bug was straightforward to fix once identified.\n", | |
"\n", | |
"#### w subset error [#799](https://github.com/pysal/pysal/issues/799)\n", | |
"Brought by a user, this bug highlighted a strong example of a place where the API, while documented, is not intuitive. \n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### [Open Issues](https://github.com/pysal/pysal/issues?q=is%3Aissue+is%3Aopen+label%3Aljwolf_GSOC)\n", | |
"\n", | |
"There are a few issues that were either opened or were assigned to this GSOC while underway. The ones that remain open pertain to work at the tail end of the project, or are longstanding tracking issues for a particular topic.\n", | |
"\n", | |
"#### contains point and nested rings [#852](https://github.com/pysal/pysal/issues/852)\n", | |
"While it's a legal polygon according to the Open Geographic Consortium standards, a polygon with concentric rings/holes returns the incorrect value when the `contains_point` function is used. \n", | |
"\n", | |
"#### wkt parser fails [#820](https://github.com/pysal/pysal/issues/824)\n", | |
"This falls within a grouping of other serialization errors that the library, in my opinion, should avoid by avoiding all serialization/deserialization work. This bug relates to incorrect behavior in the WKT deserializer, with holes not being assigned correctly as holes.\n", | |
"\n", | |
"#### wkt/wkb writer for data tables [#812](https://github.com/pysal/pysal/issues/812)\n", | |
"This is an enhancement request to work better with the tabular functions mentioned in this document. Due to PySAL's non-OGC geometric hierarchy, this proved more difficult to complete than expected, and remains undone. This is because, for the full range of possible OGC shapes, a multipolygon with multiple holes in each part needs to be first topologically sorted (in case there are nested ring/holes), and then the holes assigned to the correct exterior ring. In light of #852 remaining unresolved, this remains unresolved. \n", | |
"\n", | |
"#### optimize clockwise test [#88](https://github.com/pysal/pysal/issues/88)\n", | |
"Some work on using Cython and JIT compilation using Numba was done here. This attempted to examine the Sedgewick counterclockwise test currently used for points and line segments in `pysal.cg`, but did not see any improvements using either accelerative tool. \n", | |
"\n", | |
"#### Tracking Issues\n", | |
"The rest of the open issues are long-term tracking issues that, while opened or assigned to this GSOC, are not necessarily *closed* by this GSOC, since there is no end to their tracking horizon. \n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Next Steps\n", | |
"\n", | |
"After doing this project, I think I realized a few things about both architecture of a python module and how to encourage sustainable contribution in academic software. \n", | |
"\n", | |
"As I detail [here](http://ljwolf.org/post/147934748469/a-post-scipy-chicago-update), the trade-offs that academic authors make in writing code that has both pedagogical value, is easily redistributable, and can enable novel research are difficult to gauge. I believe that the library sits at a position of remarkable potential, but the maintenance burden is growing at a rate that is unsustainable. A significant portion of the code should be \"outsourced\" to packages that have comparative advantage in things like computational geometry or sparse network data structures. \n", | |
"\n", | |
"This concern is, in part, why the remaining issues pertain to the computational geometry classes in PySAL, and have not been closed/resolved. Reducing the maintenance burden was part of the goal of this GSOC. Increasing the amount of custom code maintained by the library for \"enhancements\" like wkt/wkb parsing would be unwise and run counter to this goal. Unfortunately, even for the code that has been written in this project, unless some existing code is deprecated, the maintenance burden has only been increased.\n", | |
"\n", | |
"Enriching the rest of the library, such as the `spatial_dynamics` or `spreg` spatial regression module to use pandas dataframes natively would also be a logical next step, but it's less clear as to how to extend this interaction method to those parts of the library without first assessing what the intended API of the module should be. Since these discussions often invite bikeshedding (and, indeed, recall past bikeshedding in this case), the extensions remain to be completed. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.11" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment