Skip to content

Instantly share code, notes, and snippets.

@martinsotir
Last active June 7, 2019 22:00
Show Gist options
  • Save martinsotir/dbd63e002e8563745c6a4c781fb3f708 to your computer and use it in GitHub Desktop.
Save martinsotir/dbd63e002e8563745c6a4c781fb3f708 to your computer and use it in GitHub Desktop.
Conditional merge in pandas
import pandas as pd
def join_part(A, B, cond, left_on, right_on):
C = A.merge(B, left_on=left_on, right_on=right_on, how="inner", copy=False)
return C[cond].copy()
def conditional_join(A, B, cond, left_on, right_on, batch_size=50000):
indices = range(0, len(A) + batch_size, batch_size)
batches = (A.iloc[b_start : b_start + batch_size] for b_start in indices)
merges = (join_part(subset, B, cond, left_on, right_on) for subset in batches)
return pd.concat(merges)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment