Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save vikramsoni2/375427c63e8a3311c022ae860b8add8a to your computer and use it in GitHub Desktop.
Save vikramsoni2/375427c63e8a3311c022ae860b8add8a to your computer and use it in GitHub Desktop.
Remove Highly Correlated Features
def remove_highly_correlated_features(df = None, ths=0.99):
print("The data has : {} features ".format(len(list(df))))
correlations = df.corr()
mask = np.triu(np.ones_like(correlations, dtype=bool))
corr_matrix = correlations.abs()
# Create a True/False mask and apply it
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
tri_df = corr_matrix.mask(mask)
# List column names of highly correlated features (r >0.98 )
to_drop = [c for c in tri_df.columns if any(tri_df[c] > ths)]
print("The data has : {} highly correlated predictors".format(len(to_drop)))
return to_drop, correlations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment