Skip to content

Instantly share code, notes, and snippets.

@jonshaffer
Last active September 5, 2025 12:05
Show Gist options
  • Select an option

  • Save jonshaffer/6d9888494e1f657fab85e297a40e60f7 to your computer and use it in GitHub Desktop.

Select an option

Save jonshaffer/6d9888494e1f657fab85e297a40e60f7 to your computer and use it in GitHub Desktop.

Legend: ★ = must-know, ◇ = nice-to-have.

Classification

  • ★ Logistic Regression — linear decision boundary; well-calibrated probs, fast baseline.
  • ◇ k-Nearest Neighbors — vote of nearby points; non-parametric, can be slow at scale.
  • ★ Naive Bayes (Gaussian/Multinomial) — assumes feature independence; great for text.
  • ★ Support Vector Machines (SVM) — max-margin hyperplane; kernels handle nonlinearity.
  • ★ Decision Tree — if-then rules; interpretable, prone to overfit without pruning.
  • ★ Random Forest — bagged trees; strong default, robust to noise, handles mixed types.
  • ★ Gradient Boosting (XGBoost/LightGBM/CatBoost) — boosted trees; state-of-the-art tabular.
  • ★ Neural Networks (MLP/CNN/RNN/Transformers) — flexible function approximators; needs tuning.
  • ◇ Linear/Quadratic Discriminant Analysis (LDA/QDA) — generative boundaries; quick, simple.
  • ◇ Gaussian Process Classification — probabilistic with uncertainty; expensive for large n.
  • ◇ RuleFit / RIPPER — sparse rule sets; interpretable alternatives to trees.

Regression

  • ★ Ordinary Least Squares — linear relationship with MSE objective; fast baseline.
  • ★ Ridge / Lasso / Elastic Net — linear with L2/L1/combined penalties; combats multicollinearity & selects features.
  • ◇ Polynomial Regression — linear in expanded features; captures simple curvature.
  • ◇ Support Vector Regression (SVR) — ε-insensitive loss; robust to outliers, slower at scale.
  • ◇ Decision Tree Regressor — piecewise constant; interpretable, can overfit.
  • ★ Random Forest Regressor — averages trees; strong, low tuning.
  • ★ Gradient Boosted Trees (XGB/LGBM/CatBoost) — powerful on tabular, handles nonlinearity.
  • ◇ kNN Regressor — local averaging; non-parametric, sensitive to scale.
  • ◇ Gaussian Process Regression — smooth functions with uncertainty; cubic scaling.
  • ◇ Robust Regression (Huber/RANSAC/Theil–Sen) — less sensitive to outliers.
  • ◇ Generalized Linear Models (Poisson/Gamma/Tweedie) — non-Gaussian targets with link functions.
  • ◇ Quantile Regression — predicts conditional quantiles; good for asymmetric loss.

Clustering

  • ★ k-Means — centroid-based, spherical clusters; fast, needs k.
  • ◇ k-Medoids (PAM) — like k-means but uses actual points; robust to outliers.
  • ★ Agglomerative Hierarchical (Ward/complete/etc.) — dendrogram; no need to pre-set k.
  • ★ Gaussian Mixture Models (EM) — soft probabilistic clusters; elliptical shapes.
  • ★ DBSCAN — density-based; finds arbitrary shapes & noise, no k, needs ε/minPts.
  • ◇ HDBSCAN — DBSCAN variant with hierarchy; less parameter sensitive.
  • ◇ OPTICS — orders points by density; handles variable density.
  • ◇ Spectral Clustering — graph Laplacian; good for non-convex manifolds.
  • ◇ Mean Shift — mode seeking; auto-determines number via bandwidth.
  • ◇ BIRCH — incremental; efficient for very large datasets.
  • ◇ Self-Organizing Maps (SOM) — neural grid preserving topology.

Dimensionality Reduction

  • ★ PCA — orthogonal variance directions; linear, fast.
  • ◇ Truncated SVD — PCA for sparse data (e.g., TF-IDF) without centering.
  • ◇ Kernel PCA — nonlinear variant via kernels.
  • ★ t-SNE — preserves local neighborhoods; great for visualization, not for downstream features.
  • ★ UMAP — faster than t-SNE, preserves global/local structure; popular for viz & features.
  • ◇ Isomap — geodesic distances on manifolds.
  • ◇ LLE (and HLLE) — locally linear embeddings; manifold learning.
  • ◇ Autoencoders (deep) — nonlinear compression via NN bottleneck.
  • ◇ NMF — non-negative factors; parts-based representations (e.g., topics).
  • ◇ ICA — independent components; separates mixed signals.
  • ◇ Factor Analysis — latent variable model for covariance structure.
  • ◇ Random Projections — Johnson–Lindenstrauss; very fast approximate reduction.
  • ◇ Fisher’s LDA — supervised projection maximizing class separability.

Model Selection / Tuning

  • ★ Hold-out Split — simple train/validation/test evaluation.
  • ★ k-Fold Cross-Validation (Stratified/Group/Time-Series) — robust generalization estimates.
  • ◇ Grid Search — exhaustive hyperparameter scan over a grid.
  • ★ Random Search — random sampling; surprisingly strong baseline.
  • ◇ Bayesian Optimization (GP/TPE/SMBO) — sample-efficient hyperparameter tuning.
  • ◇ Successive Halving / Hyperband / ASHA — early-stops poor configs to save compute.
  • ★ Early Stopping — stop training when validation stalls (boosting/NNs).
  • ◇ Information Criteria (AIC/BIC) — penalized likelihood for model order.
  • ◇ Nested Cross-Validation — unbiased model selection + performance estimate.
  • ◇ Bootstrap (.632+) — resampling-based error estimation, small-data friendly.
  • ★ Ensembling/Stacking/Blending — combine models; often beats single best.

Preprocessing / Feature Engineering

  • ★ Scaling — Standard, Min-Max, MaxAbs, Robust; essential for distance/kernels.
  • ◇ Power Transforms — Box-Cox, Yeo–Johnson; stabilize variance, Gaussianize.
  • ★ Imputation — Mean/Median, KNNImputer, Iterative/MICE; fills missing values.
  • ★ Categorical Encoding — One-Hot, Ordinal, Target/Mean, Leave-One-Out, Hashing.
  • ★ Text Features — tokenization, n-grams, TF-IDF, subword/BPE; basic NLP pipeline.
  • ◇ Feature Generation — polynomial/interactions, feature hashing, datetime decompositions.
  • ★ Feature Selection — filter (chi²/F-test/MI), wrapper (RFE), embedded (L1/Lasso, tree importances).
  • ◇ Outlier Handling — z-score/Winsorization, Isolation Forest, Local Outlier Factor.
  • ★ Resampling (Imbalance) — stratification, class weights, SMOTE/ADASYN, undersampling.
  • ◇ Time-Series Prep — differencing, rolling stats, seasonal decomposition, lag features.
  • ◇ Image Prep — normalization, resizing, augmentation (flip/rotate/crop).
  • ◇ Target Transforms — log/Box-Cox for skewed regression targets.
  • ★ Leakage-Safe Pipelines — fit transforms only on train; use pipelines to avoid leakage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment