Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Abhayparashar31/54fb8346ec5651af20eef01f83e5d3ce to your computer and use it in GitHub Desktop.
Save Abhayparashar31/54fb8346ec5651af20eef01f83e5d3ce to your computer and use it in GitHub Desktop.
def fill_proportionally(col, dataset):
import random
random.seed(0)
# getting all unique values (without nan)
values = dataset[col].dropna().unique()
# getting weights for probability weighting
weights = dataset[col].value_counts().values / dataset[col].value_counts().values.sum()
print('Before Imputation Probablity Weights\n',weights)
# filling
dataset[col] = dataset[col].apply(lambda x: random.choices(values, weights=weights)[0] if pd.isnull(x) else x)
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Abhayparashar31/datasets/master/titanic.csv')
### Imputing Missing Categories
fill_proportionally('Embarked', df)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment