Skip to content

Instantly share code, notes, and snippets.

@isc-rsingh
Created July 30, 2017 02:16
Show Gist options
  • Save isc-rsingh/304546d7a16e507e6b428996119e840c to your computer and use it in GitHub Desktop.
Save isc-rsingh/304546d7a16e507e6b428996119e840c to your computer and use it in GitHub Desktop.
Urbanity in the US, Categorizing urban density
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Categorizing urban density\n",
"\n",
"last modified: July, 2017\n",
"\n",
"author: [Raj Singh](https://developer.ibm.com/clouddataservices/author/rrsingh/)\n",
"\n",
"original: https://github.com/ibm-cds-labs/open-data/blob/master/samples/urbanity.ipynb\n",
"\n",
"\n",
"## Overview\n",
"\n",
"Exploration of an academic study of urban structure and density described in the June 2014 article, [\"From Jurisdictional to Functional Analysis of Urban Cores & Suburbs\"](http://www.newgeography.com/content/004349-from-jurisdictional-functional-analysis-urban-cores-suburbs) in [new geography](http://www.newgeography.com/). \n",
"\n",
"## Categories\n",
"\n",
"- Urban (pre-auto urban core): density > 2,900 sq. km\n",
"- Auto suburban, early: median house built 1946 to 1979, density < 2,900 sq. km and density > 100 sq. km\n",
"- Auto suburban, later: median house built after 1979, density < 2,900 sq. km and density > 100 sq. km\n",
"- Auto exurban: all others\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd, numpy as np, os"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Collect U.S. Census data\n",
"\n",
"Census data from the 2013 US Census American Community Survey (ACS), 5-year estimates.\n",
"\n",
"Created from the \"zip code tabulation area\" (ZCTA) [TIGER/Line® with Selected Demographic and Economic Data product in Geodatabase format](http://www.census.gov/geo/maps-data/data/tiger-data.html). This particular version of the ACS is used for the folowing reasons:\n",
"\n",
"1. 5-year estimates are the most accurate data outside of the decennial census [as explained here](http://www.census.gov/programs-surveys/acs/guidance/estimates.html).\n",
"1. 2013 is the most recent data set with 5-year estimates\n",
"1. TIGER/Line® gives you the geographic boundaries of the zip codes so you can perform spatial analyses\n",
"1. This data set is smaller than the full Census, but still has the important income, education, race, age and occupation demographics we want to use.\n",
"\n",
"If you want to do this yourself, [this article](https://developer.ibm.com/clouddataservices/2015/09/08/census-open-data-on-ibm-cloud/) explains how to get a CSV out of that format.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get zip code areas from Census"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ALAND10</th>\n",
" </tr>\n",
" <tr>\n",
" <th>GEOID</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>86000US43451</th>\n",
" <td>63411475</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US43452</th>\n",
" <td>121783680</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US43456</th>\n",
" <td>9389360</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US43457</th>\n",
" <td>48035540</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US43458</th>\n",
" <td>2573816</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ALAND10\n",
"GEOID \n",
"86000US43451 63411475\n",
"86000US43452 121783680\n",
"86000US43456 9389360\n",
"86000US43457 48035540\n",
"86000US43458 2573816"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"AREAS_URL = \"https://openobjectstore.mybluemix.net/censusacs2013zip/areas.csv\"\n",
"geo_df = pd.read_csv( AREAS_URL, usecols=['GEOID_Data','ALAND10'], dtype={\"GEOID_Data\": np.str, \"ALAND10\": np.int} )\n",
"geo_df.columns = ['ALAND10','GEOID']\n",
"geo_df = geo_df.set_index('GEOID')\n",
"geo_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>POPULATION</th>\n",
" </tr>\n",
" <tr>\n",
" <th>GEOID</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>86000US01001</th>\n",
" <td>17245</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US01002</th>\n",
" <td>29266</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US01003</th>\n",
" <td>11032</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US01005</th>\n",
" <td>5356</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US01007</th>\n",
" <td>14673</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" POPULATION\n",
"GEOID \n",
"86000US01001 17245\n",
"86000US01002 29266\n",
"86000US01003 11032\n",
"86000US01005 5356\n",
"86000US01007 14673"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"POP_URL = \"https://apsportal.ibm.com/exchange-api/v1/entries/beb8c30a3f559e58716d983671b65c10/data?accessKey=afb441da02fb0f7f9dcfad44ec5d4b22\t\"\n",
"pop_df = pd.read_csv( POP_URL, usecols=['GEOID','B01001e1'], dtype={\"GEOID\": np.str} )\n",
"pop_df.columns = ['GEOID','POPULATION']\n",
"pop_df = pop_df.set_index('GEOID')\n",
"pop_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get housing age from Census\n",
"NOTE: this is a large 210Mb file and may take a few minutes to load"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>B25035e1</th>\n",
" </tr>\n",
" <tr>\n",
" <th>GEOID</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>86000US24483</th>\n",
" <td>1991.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US50653</th>\n",
" <td>1944.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US35504</th>\n",
" <td>1986.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US38619</th>\n",
" <td>1984.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US76469</th>\n",
" <td>1959.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" B25035e1\n",
"GEOID \n",
"86000US24483 1991.0\n",
"86000US50653 1944.0\n",
"86000US35504 1986.0\n",
"86000US38619 1984.0\n",
"86000US76469 1959.0"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"HOUSE_URL = \"https://openobjectstore.mybluemix.net/censusacs2013zip/x25_housing.csv\"\n",
"housing_df = pd.read_csv( HOUSE_URL, usecols=['GEOID','B25035e1'], dtype={\"GEOID\": np.str} )\n",
"housing_df = housing_df.set_index('GEOID')\n",
"housing_df.sample(5)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ALAND10</th>\n",
" <th>POPULATION</th>\n",
" <th>B25035e1</th>\n",
" </tr>\n",
" <tr>\n",
" <th>GEOID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>86000US01851</th>\n",
" <td>8645547</td>\n",
" <td>29791</td>\n",
" <td>1942.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US74452</th>\n",
" <td>67415917</td>\n",
" <td>457</td>\n",
" <td>1993.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US84755</th>\n",
" <td>207949154</td>\n",
" <td>92</td>\n",
" <td>1985.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US65349</th>\n",
" <td>246384166</td>\n",
" <td>2391</td>\n",
" <td>1955.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US80219</th>\n",
" <td>19441819</td>\n",
" <td>64506</td>\n",
" <td>1956.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ALAND10 POPULATION B25035e1\n",
"GEOID \n",
"86000US01851 8645547 29791 1942.0\n",
"86000US74452 67415917 457 1993.0\n",
"86000US84755 207949154 92 1985.0\n",
"86000US65349 246384166 2391 1955.0\n",
"86000US80219 19441819 64506 1956.0"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"urban_df = geo_df.join(pop_df)\n",
"urban_df = urban_df.join(housing_df)\n",
"urban_df.sample(5)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>AREAMSQ</th>\n",
" <th>Population</th>\n",
" <th>MEDYRBUILT</th>\n",
" </tr>\n",
" <tr>\n",
" <th>GEOID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>86000US30308</th>\n",
" <td>4130054</td>\n",
" <td>16434</td>\n",
" <td>1975.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US18234</th>\n",
" <td>670491</td>\n",
" <td>419</td>\n",
" <td>1939.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US72029</th>\n",
" <td>131043756</td>\n",
" <td>1776</td>\n",
" <td>1966.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US52535</th>\n",
" <td>149442215</td>\n",
" <td>884</td>\n",
" <td>1953.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US28387</th>\n",
" <td>75317963</td>\n",
" <td>13980</td>\n",
" <td>1980.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" AREAMSQ Population MEDYRBUILT\n",
"GEOID \n",
"86000US30308 4130054 16434 1975.0\n",
"86000US18234 670491 419 1939.0\n",
"86000US72029 131043756 1776 1966.0\n",
"86000US52535 149442215 884 1953.0\n",
"86000US28387 75317963 13980 1980.0"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"urban_df.columns = ['AREAMSQ','Population','MEDYRBUILT']\n",
"urban_df.sample(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Density calculation\n",
"### Compute population density as persons per square kilometer\n",
"- persons per square km = persons / (area in square meters / 1,000,000)\n",
"- persons per hectare = persons / (area in square meters / 10,000)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>AREAMSQ</th>\n",
" <th>Population</th>\n",
" <th>MEDYRBUILT</th>\n",
" <th>POPPERKMSQ</th>\n",
" </tr>\n",
" <tr>\n",
" <th>GEOID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>86000US35974</th>\n",
" <td>34030088</td>\n",
" <td>1482</td>\n",
" <td>1975.0</td>\n",
" <td>43.549696</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US68516</th>\n",
" <td>49315802</td>\n",
" <td>39683</td>\n",
" <td>1991.0</td>\n",
" <td>804.671087</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US59868</th>\n",
" <td>468649275</td>\n",
" <td>1675</td>\n",
" <td>1980.0</td>\n",
" <td>3.574101</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US80705</th>\n",
" <td>807565</td>\n",
" <td>850</td>\n",
" <td>1974.0</td>\n",
" <td>1052.546854</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" AREAMSQ Population MEDYRBUILT POPPERKMSQ\n",
"GEOID \n",
"86000US35974 34030088 1482 1975.0 43.549696\n",
"86000US68516 49315802 39683 1991.0 804.671087\n",
"86000US59868 468649275 1675 1980.0 3.574101\n",
"86000US80705 807565 850 1974.0 1052.546854"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"urban_df['POPPERKMSQ'] = urban_df['Population'] / (urban_df['AREAMSQ']/1000000)\n",
"urban_df.sample(4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Group population density into 4 categories"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/src/bluemix_jupyter_bundle.v54/notebook/lib/python2.7/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame\n",
"\n",
"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
" from ipykernel import kernelapp as app\n",
"/usr/local/src/bluemix_jupyter_bundle.v54/notebook/lib/python2.7/site-packages/ipykernel/__main__.py:3: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame\n",
"\n",
"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
" app.launch_new_instance()\n",
"/usr/local/src/bluemix_jupyter_bundle.v54/notebook/lib/python2.7/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame\n",
"\n",
"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>AREAMSQ</th>\n",
" <th>Population</th>\n",
" <th>MEDYRBUILT</th>\n",
" <th>POPPERKMSQ</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>3.298900e+04</td>\n",
" <td>32989.000000</td>\n",
" <td>32045.000000</td>\n",
" <td>32989.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>2.250019e+08</td>\n",
" <td>9443.177453</td>\n",
" <td>1971.068529</td>\n",
" <td>487.703913</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>6.575933e+08</td>\n",
" <td>13858.010530</td>\n",
" <td>15.606758</td>\n",
" <td>1912.093435</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>5.094000e+03</td>\n",
" <td>0.000000</td>\n",
" <td>1939.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>2.351832e+07</td>\n",
" <td>719.000000</td>\n",
" <td>1961.000000</td>\n",
" <td>7.754449</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>9.322068e+07</td>\n",
" <td>2781.000000</td>\n",
" <td>1974.000000</td>\n",
" <td>30.194351</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>2.304373e+08</td>\n",
" <td>12830.000000</td>\n",
" <td>1982.000000</td>\n",
" <td>249.247358</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>3.478591e+10</td>\n",
" <td>114734.000000</td>\n",
" <td>2011.000000</td>\n",
" <td>71226.281507</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" AREAMSQ Population MEDYRBUILT POPPERKMSQ\n",
"count 3.298900e+04 32989.000000 32045.000000 32989.000000\n",
"mean 2.250019e+08 9443.177453 1971.068529 487.703913\n",
"std 6.575933e+08 13858.010530 15.606758 1912.093435\n",
"min 5.094000e+03 0.000000 1939.000000 0.000000\n",
"25% 2.351832e+07 719.000000 1961.000000 7.754449\n",
"50% 9.322068e+07 2781.000000 1974.000000 30.194351\n",
"75% 2.304373e+08 12830.000000 1982.000000 249.247358\n",
"max 3.478591e+10 114734.000000 2011.000000 71226.281507"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"urban_df['CAT'] = 'EXURBAN'\n",
"urban_df['CAT'][(urban_df['POPPERKMSQ'] >= 2900)] = 'URBAN'\n",
"urban_df['CAT'][(urban_df['POPPERKMSQ'] < 2900) & (urban_df['POPPERKMSQ'] >= 100) & (urban_df['MEDYRBUILT'] < 1980) & (urban_df['MEDYRBUILT'] >= 1946)] = 'SUBURBANEARLY'\n",
"urban_df['CAT'][(urban_df['POPPERKMSQ'] < 2900) & (urban_df['POPPERKMSQ'] >= 100) & (urban_df['MEDYRBUILT'] >= 1980)] = 'SUBURBANLATE'\n",
"urban_df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>AREAMSQ</th>\n",
" <th>Population</th>\n",
" <th>MEDYRBUILT</th>\n",
" <th>POPPERKMSQ</th>\n",
" <th>CAT</th>\n",
" </tr>\n",
" <tr>\n",
" <th>GEOID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>86000US30620</th>\n",
" <td>61601719</td>\n",
" <td>12043</td>\n",
" <td>2000.0</td>\n",
" <td>195.497791</td>\n",
" <td>SUBURBANLATE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US19809</th>\n",
" <td>18110809</td>\n",
" <td>14405</td>\n",
" <td>1953.0</td>\n",
" <td>795.381366</td>\n",
" <td>SUBURBANEARLY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US33573</th>\n",
" <td>45695319</td>\n",
" <td>19566</td>\n",
" <td>1990.0</td>\n",
" <td>428.183902</td>\n",
" <td>SUBURBANLATE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US04732</th>\n",
" <td>1134489630</td>\n",
" <td>1746</td>\n",
" <td>1970.0</td>\n",
" <td>1.539018</td>\n",
" <td>EXURBAN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US66849</th>\n",
" <td>175929144</td>\n",
" <td>483</td>\n",
" <td>1939.0</td>\n",
" <td>2.745423</td>\n",
" <td>EXURBAN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US16048</th>\n",
" <td>2861880</td>\n",
" <td>154</td>\n",
" <td>1942.0</td>\n",
" <td>53.810782</td>\n",
" <td>EXURBAN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US81432</th>\n",
" <td>707150300</td>\n",
" <td>2780</td>\n",
" <td>1995.0</td>\n",
" <td>3.931272</td>\n",
" <td>EXURBAN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US78675</th>\n",
" <td>194898885</td>\n",
" <td>224</td>\n",
" <td>1993.0</td>\n",
" <td>1.149314</td>\n",
" <td>EXURBAN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US98310</th>\n",
" <td>15437635</td>\n",
" <td>19220</td>\n",
" <td>1968.0</td>\n",
" <td>1245.009355</td>\n",
" <td>SUBURBANEARLY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86000US32361</th>\n",
" <td>5938265</td>\n",
" <td>85</td>\n",
" <td>1946.0</td>\n",
" <td>14.313945</td>\n",
" <td>EXURBAN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" AREAMSQ Population MEDYRBUILT POPPERKMSQ CAT\n",
"GEOID \n",
"86000US30620 61601719 12043 2000.0 195.497791 SUBURBANLATE\n",
"86000US19809 18110809 14405 1953.0 795.381366 SUBURBANEARLY\n",
"86000US33573 45695319 19566 1990.0 428.183902 SUBURBANLATE\n",
"86000US04732 1134489630 1746 1970.0 1.539018 EXURBAN\n",
"86000US66849 175929144 483 1939.0 2.745423 EXURBAN\n",
"86000US16048 2861880 154 1942.0 53.810782 EXURBAN\n",
"86000US81432 707150300 2780 1995.0 3.931272 EXURBAN\n",
"86000US78675 194898885 224 1993.0 1.149314 EXURBAN\n",
"86000US98310 15437635 19220 1968.0 1245.009355 SUBURBANEARLY\n",
"86000US32361 5938265 85 1946.0 14.313945 EXURBAN"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# look at a few records to do a quick sanity check\n",
"urban_df.sample(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Save the results\n",
"\n",
"[I've saved a copy of the results to a CSV file here](https://openobjectstore.mybluemix.net/misc/urbanity.csv)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2 with Spark 2.0",
"language": "python",
"name": "python2-spark20"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment