Last active
August 26, 2023 00:48
-
-
Save toddbirchard/9f6f1508cb24e78315778837a7d31328 to your computer and use it in GitHub Desktop.
Helper function to compare two DataFrames and find rows which are unique or shared.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""Find Symmetric Differences between two Pandas DataFrames.""" | |
def dataframe_difference(df1, df2, which=None): | |
"""Find rows which are different.""" | |
comparison_df = df1.merge( | |
df2, | |
indicator=True, | |
how='outer' | |
) | |
if which is None: | |
diff_df = comparison_df[comparison_df['_merge'] != 'both'] | |
else: | |
diff_df = comparison_df[comparison_df['_merge'] == which] | |
diff_df.to_csv('data/diff.csv') | |
return diff_df |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for this helper function!
Might as well help with some questions while I am here:
The code is selecting !='both' (not equal to 'both') when not passing in a which paramater value - so you only get left_only and right_only.
To get the values in 'both' dataframes try:
dataframe_difference(wk15,wk16,which='both')