Skip to content

Instantly share code, notes, and snippets.

View ryanbehdad's full-sized avatar
🏃‍♂️

Ryan Behdad ryanbehdad

🏃‍♂️
View GitHub Profile
@ryanbehdad
ryanbehdad / SHAP_Full_Explanation.py
Last active October 16, 2020 03:36
SHAP Full Explanation
# SHAP's force plot does not label all the important features
# We usually need to get the top (20) feautures that affect a decision for a particular instance
# In addition to their name, the features' values and their shapley values are also required.
# The below snippet
# 1. creates a dataframe containing all the features, their shapley value and their actual value
# 2. and exports the dataframe to a csv file
# 3. It also displays the force plot
import shap
shap.initjs()
@ryanbehdad
ryanbehdad / compare_df_columns.py
Last active January 12, 2021 07:51
Compare the columns of two dataframes (including their dtypes)
def compare_df_columns(df1, df2):
"""
Compare the columns of two dataframes (including their types)
"""
matched = True
# Compare number of rows
if df1.shape[0] != df2.shape[0]:
print(f'Row numbers do not match {df1.shape[0]:,} vs {df2.shape[0]:,}')
matched=False
@ryanbehdad
ryanbehdad / nbstripout.md
Last active January 16, 2025 04:41
nbstripout

Using nbstripout to Clean Outputs of Jupyter Notebooks before Commits

To improve version control for Jupyter notebooks, consider using nbstripout. This tool automatically removes output cells from notebooks before committing them to Git.

Benefits

  • Cleaner Version History: Eliminates unnecessary output data, making diffs more readable.
  • Reduced Repository Size: Keeps the repository lightweight by excluding bulky output files.
  • Consistent Results: Ensures that notebooks run consistently across different environments.