Skip to content

Instantly share code, notes, and snippets.

@makmanalp
Last active August 29, 2015 14:02
Show Gist options
  • Save makmanalp/2e9cd95f0cd9b38a679c to your computer and use it in GitHub Desktop.
Save makmanalp/2e9cd95f0cd9b38a679c to your computer and use it in GitHub Desktop.
Pandas merge: detecting where data came from
from pandas import DataFrame, merge
df = DataFrame(np.random.randn(10, 2), columns=["id", "sex"])
df2 = DataFrame(np.random.randn(10, 2), columns=["user_id", "name"])
df.id = range(10)
df2.user_id = range(3,13)
merge(df, df2, left_on="id", right_on="user_id", how="outer")
"""
id sex user_id name
0 0 -0.254309 NaN NaN
1 1 -0.363123 NaN NaN
2 2 -0.408873 NaN NaN
3 3 -1.209845 3 0.578440
4 4 0.952290 4 -1.336396
5 5 -0.091704 5 0.255794
6 6 0.984578 6 -0.469222
7 7 -0.694126 7 1.197256
8 8 0.369942 8 -0.656366
9 9 1.544090 9 -0.975548
10 NaN NaN 10 -1.827958
11 NaN NaN 11 -1.523407
12 NaN NaN 12 -0.785032
"""
# since you know id came from the left and user_id came from the right,
# you know that if id is NaN then the data didn't exist in df and if
# user_id is NaN then the data didn't exist in df2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment