Skip to content

Instantly share code, notes, and snippets.

@llimllib
Last active March 23, 2021 01:47
Show Gist options
  • Save llimllib/39fbc5ce2477d3e978ff89e2e499b71e to your computer and use it in GitHub Desktop.
Save llimllib/39fbc5ce2477d3e978ff89e2e499b71e to your computer and use it in GitHub Desktop.

This is the story of how it took me about 3 hours to pull a key from a census data index into its own column.

The censusdata library returns dataframes that have an index object:

In [1]: import censusdata

In [2]: df = censusdata.download('acs5', 2015, censusdata.censusgeo([('state', '23'), ('county', '005'), ('block group', '*')]), [('C02003_001E')]).head()

In [3]: df
Out[3]:
                                                    C02003_001E
Block Group 1, Census Tract 1, Cumberland Count...          508
Block Group 2, Census Tract 1, Cumberland Count...          890
Block Group 3, Census Tract 1, Cumberland Count...          848
Block Group 1, Census Tract 2, Cumberland Count...          891
Block Group 2, Census Tract 2, Cumberland Count...          473

In [4]: df.index.values
Out[4]:
array([censusgeo((('state', '23'), ('county', '005'), ('tract', '000100'), ('block group', '1')), 'Block Group 1, Census Tract 1, Cumberland County, Maine'),
       censusgeo((('state', '23'), ('county', '005'), ('tract', '000100'), ('block group', '2')), 'Block Group 2, Census Tract 1, Cumberland County, Maine'),
       censusgeo((('state', '23'), ('county', '005'), ('tract', '000100'), ('block group', '3')), 'Block Group 3, Census Tract 1, Cumberland County, Maine'),
       censusgeo((('state', '23'), ('county', '005'), ('tract', '000200'), ('block group', '1')), 'Block Group 1, Census Tract 2, Cumberland County, Maine'),
       censusgeo((('state', '23'), ('county', '005'), ('tract', '000200'), ('block group', '2')), 'Block Group 2, Census Tract 2, Cumberland County, Maine')],
      dtype=object)

What I want is to convert block group and census tract to columns, but I can't for the life of me figure out how to do so. Some things I tried are:

# I thought the index was a string at first because pandas prints it out 
# like a string unless you magically know to use the `values` function to 
# show you that it's an object
#
# make it a column
df.reset_index()

# pull the block group out with a regex
df['block group'] = df['index'].str.extract('Block Group (\d+)', expand=True)

When the block group columns were NaNs, I got suspicious that it wasn't a string and managed to figure out what it was. Next up I tried stuff like this:

df['test'] = lambda x: dict(x.index.params)

But it turns out you can't pass a lambda to build a column in this way - you just get a column full of lambdas.

So I tried stuff like this:

df['test'] = dict(df.index.params)['Block Group']

but that just fails because df.index won't let you call parameters on each index member.

I read all of the relevant pandas documentation that I could find:

df['block group'] = df.index.map(lambda x: dict(x.params())['block group'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment