Skip to content

Instantly share code, notes, and snippets.

@bhuiyanmobasshir94
Created November 28, 2019 16:46
Show Gist options
  • Save bhuiyanmobasshir94/c6f2a758372388a71ace5a3a535ce075 to your computer and use it in GitHub Desktop.
Save bhuiyanmobasshir94/c6f2a758372388a71ace5a3a535ce075 to your computer and use it in GitHub Desktop.
def column_dropper(df, threshold):
# Takes a dataframe and threshold for missing values. Returns a dataframe.
total_records = df.count()
for col in df.columns:
# Calculate the percentage of missing values
missing = df.where(df[col].isNull()).count()
missing_percent = missing / total_records
# Drop column if percent of missing is more than threshold
if missing_percent > threshold:
df = df.drop(col)
return df
# Drop columns that are more than 60% missing
df = column_dropper(df, .6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment