Skip to content

Instantly share code, notes, and snippets.

@marcosan93
Last active November 12, 2021 20:23
Show Gist options
  • Select an option

  • Save marcosan93/f2b498c4b6c68c867c3ae5bcbe69f2d2 to your computer and use it in GitHub Desktop.

Select an option

Save marcosan93/f2b498c4b6c68c867c3ae5bcbe69f2d2 to your computer and use it in GitHub Desktop.
def balanceDecisions(n_df):
"""
Rebalances data so that each class/decision is represented equally.
"""
# Counting each class or decision
counts = n_df['decision'].value_counts()
# The lowest number of represented classes
low_num = counts.min()
# Sampling from the DF with the latest data
df1 = n_df[n_df['decision']==counts.sort_values().index[0]].tail(low_num)
df2 = n_df[n_df['decision']==counts.sort_values().index[1]].tail(low_num)
# Combining the resampled DFs and returning it
return df1.append(df2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment