Created
January 12, 2020 01:51
-
-
Save ortsed/3fe2b39c900e61fcfa7fbb170f9f944d to your computer and use it in GitHub Desktop.
Merge Similar Datasets
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
def merge_similar(files=[], encoding=None): | |
""" | |
Concats datasets with similar but not necessarily the same columns | |
by creating empty columns for each dataframe missing a column found in the others | |
""" | |
merged = [] | |
for file in files: | |
df = pd.read_csv(file, encoding=encoding) | |
if not len(merged): | |
merged = df | |
else: | |
all_cols = list(set(list(merged.columns) + list(df))) | |
for col in [col for col in all_cols if col not in merged.columns]: | |
merged[col] = None | |
for col in [col for col in all_cols if col not in df.columns]: | |
df[col] = None | |
merged = pd.concat([merged,df]) | |
return merged |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment