Skip to content

Instantly share code, notes, and snippets.

@ddrscott
Last active March 3, 2021 23:25
Show Gist options
  • Save ddrscott/1e7353da54ea2d162ac0bcbcf27fee25 to your computer and use it in GitHub Desktop.
Save ddrscott/1e7353da54ea2d162ac0bcbcf27fee25 to your computer and use it in GitHub Desktop.
Use Pandas to spread/flatten dictionary's items into individual columns
id color meta
1 red {"x":123}
2 green {"y":456}
3 blue {"x":789,"y":234}
# Thanks:
# https://stackoverflow.com/a/55279799
# https://stackoverflow.com/a/25512372
import pandas as pd
import json
def flatten_json(text):
try:
if text.startswith('{'):
return json.loads(text)
except Exception as e:
print("Cannot parse:", text)
print(e)
return None
# Read CSV data using converter function.
df = pd.read_csv('data.csv', escapechar='"', converters={'meta': flatten_json})
# Get all the data as a list of dictionaries
records = json.loads(df.to_json(orient="records"))
# Read all the data back into a Dataframe
flat_df = pd.json_normalize(records)
@ddrscott
Copy link
Author

ddrscott commented Mar 3, 2021

Notes:

  • This strategy works well on smaller data sets. For larger datasets, we're constrained by system memory.
  • print statements should use logger instead.
  • startswith detection can be made more sophisticated as needed.
  • OMG! This actually works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment