This is the story of how it took me about 3 hours to pull a key from a census data index into its own column.
The censusdata
library returns dataframes that have an index object:
In [1]: import censusdata
In [2]: df = censusdata.download('acs5', 2015, censusdata.censusgeo([('state', '23'), ('county', '005'), ('block group', '*')]), [('C02003_001E')]).head()
In [3]: df
Out[3]:
C02003_001E
Block Group 1, Census Tract 1, Cumberland Count... 508
Block Group 2, Census Tract 1, Cumberland Count... 890
Block Group 3, Census Tract 1, Cumberland Count... 848
Block Group 1, Census Tract 2, Cumberland Count... 891
Block Group 2, Census Tract 2, Cumberland Count... 473
In [4]: df.index.values
Out[4]:
array([censusgeo((('state', '23'), ('county', '005'), ('tract', '000100'), ('block group', '1')), 'Block Group 1, Census Tract 1, Cumberland County, Maine'),
censusgeo((('state', '23'), ('county', '005'), ('tract', '000100'), ('block group', '2')), 'Block Group 2, Census Tract 1, Cumberland County, Maine'),
censusgeo((('state', '23'), ('county', '005'), ('tract', '000100'), ('block group', '3')), 'Block Group 3, Census Tract 1, Cumberland County, Maine'),
censusgeo((('state', '23'), ('county', '005'), ('tract', '000200'), ('block group', '1')), 'Block Group 1, Census Tract 2, Cumberland County, Maine'),
censusgeo((('state', '23'), ('county', '005'), ('tract', '000200'), ('block group', '2')), 'Block Group 2, Census Tract 2, Cumberland County, Maine')],
dtype=object)
What I want is to convert block group and census tract to columns, but I can't for the life of me figure out how to do so. Some things I tried are:
# I thought the index was a string at first because pandas prints it out
# like a string unless you magically know to use the `values` function to
# show you that it's an object
#
# make it a column
df.reset_index()
# pull the block group out with a regex
df['block group'] = df['index'].str.extract('Block Group (\d+)', expand=True)
When the block group
columns were NaN
s, I got suspicious that it wasn't a string and managed to figure out what it was. Next up I tried stuff like this:
df['test'] = lambda x: dict(x.index.params)
But it turns out you can't pass a lambda to build a column in this way - you just get a column full of lambdas.
So I tried stuff like this:
df['test'] = dict(df.index.params)['Block Group']
but that just fails because df.index won't let you call parameters on each index member.
I read all of the relevant pandas documentation that I could find:
- the page on multiindexes but I seem to have regular indexes that contain objects instead of a multiindex
- the cookbook gave me hope that
df.index.applymap
might be what I want? but index has no applymap method - the indexing and selecting page which finally finally gave me the answer!
df['block group'] = df.index.map(lambda x: dict(x.params())['block group'])