Skip to content

Instantly share code, notes, and snippets.

@ortsed
Created January 12, 2020 01:51
Show Gist options
  • Save ortsed/3fe2b39c900e61fcfa7fbb170f9f944d to your computer and use it in GitHub Desktop.
Save ortsed/3fe2b39c900e61fcfa7fbb170f9f944d to your computer and use it in GitHub Desktop.
Merge Similar Datasets
import pandas as pd
def merge_similar(files=[], encoding=None):
"""
Concats datasets with similar but not necessarily the same columns
by creating empty columns for each dataframe missing a column found in the others
"""
merged = []
for file in files:
df = pd.read_csv(file, encoding=encoding)
if not len(merged):
merged = df
else:
all_cols = list(set(list(merged.columns) + list(df)))
for col in [col for col in all_cols if col not in merged.columns]:
merged[col] = None
for col in [col for col in all_cols if col not in df.columns]:
df[col] = None
merged = pd.concat([merged,df])
return merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment