Skip to content

Instantly share code, notes, and snippets.

@suriyadeepan
Last active November 16, 2020 10:06
Show Gist options
  • Save suriyadeepan/b6319adfc43a2e49845cda6e3134d362 to your computer and use it in GitHub Desktop.
Save suriyadeepan/b6319adfc43a2e49845cda6e3134d362 to your computer and use it in GitHub Desktop.
Blog: Hypothesis Generation
Stage Description
1 Hypothesis Generation Study the business problem. Build a conceptual model by developing a deeper understanding of the problem and domain. Generate Hypotheses.
2 Data Collection Go out in the wild and collect data based on the generated hypotheses.
3 Study the variables Identify potential predictors using data visualization
4 Data Preparation Clean the data. Fill in missing data points. Scale, normalize and transform data as necessary.
5 Bivariate/Multivariate Analysis Test the hypotheses you've generated earlier. Choose predictors based on correlation with target.
6 Data Transformation Perform non-linear transformations (log) on variables to fish out non-linear relationships with the target variable - log-linear, linear-log, log-log, etc.
7 Feature Engineering Engineer new features guided by your data intuition
8 Model Evaluation Choose a list of appropriate models and rank them by evaluating against the validation set
9 Hyperparameter Search Find the optimal hyperparameters for the models. Re-evaluate and re-rank models
10 Ensembling Stabilize the final model using ensemble methods like averaging, voting and stacking.
11 Model Explanation Extract insights from the model by using visualization or explanatory tools like SHAP values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment