Skip to content

Instantly share code, notes, and snippets.

@martinsotir
Created March 28, 2018 18:56
Show Gist options
  • Save martinsotir/87a78eb4aa39598e85d9e94d17b4e563 to your computer and use it in GitHub Desktop.
Save martinsotir/87a78eb4aa39598e85d9e94d17b4e563 to your computer and use it in GitHub Desktop.
import itertools
import pandas as pd
def flatten_df(df, list_col, elem_col_name="elem"):
"""Convert a series of list to individual rows, within a dataframe.
Adapted from https://stackoverflow.com/a/48532692
This function can be used on a dask dataframe:
```python
df.map_partitions(lambda x: flatten_df(x, "list_col", elem_col_name="elem")).clear_divisions()
// I am not sure if the clear_divisions is requied
```
"""
len_lists = df[list_col].str.len()
return pd.DataFrame({
**{col: df[col].repeat(len_lists) for col in df.columns.drop(list_col)},
elem_col_name: list(itertools.chain.from_iterable(df[list_col].values))})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment