Legend: ★ = must-know, ◇ = nice-to-have.
- ★ Logistic Regression — linear decision boundary; well-calibrated probs, fast baseline.
- ◇ k-Nearest Neighbors — vote of nearby points; non-parametric, can be slow at scale.
- ★ Naive Bayes (Gaussian/Multinomial) — assumes feature independence; great for text.
- ★ Support Vector Machines (SVM) — max-margin hyperplane; kernels handle nonlinearity.
- ★ Decision Tree — if-then rules; interpretable, prone to overfit without pruning.
- ★ Random Forest — bagged trees; strong default, robust to noise, handles mixed types.
- ★ Gradient Boosting (XGBoost/LightGBM/CatBoost) — boosted trees; state-of-the-art tabular.
- ★ Neural Networks (MLP/CNN/RNN/Transformers) — flexible function approximators; needs tuning.
- ◇ Linear/Quadratic Discriminant Analysis (LDA/QDA) — generative boundaries; quick, simple.
- ◇ Gaussian Process Classification — probabilistic with uncertainty; expensive for large n.
- ◇ RuleFit / RIPPER — sparse rule sets; interpretable alternatives to trees.
- ★ Ordinary Least Squares — linear relationship with MSE objective; fast baseline.
- ★ Ridge / Lasso / Elastic Net — linear with L2/L1/combined penalties; combats multicollinearity & selects features.
- ◇ Polynomial Regression — linear in expanded features; captures simple curvature.
- ◇ Support Vector Regression (SVR) — ε-insensitive loss; robust to outliers, slower at scale.
- ◇ Decision Tree Regressor — piecewise constant; interpretable, can overfit.
- ★ Random Forest Regressor — averages trees; strong, low tuning.
- ★ Gradient Boosted Trees (XGB/LGBM/CatBoost) — powerful on tabular, handles nonlinearity.
- ◇ kNN Regressor — local averaging; non-parametric, sensitive to scale.
- ◇ Gaussian Process Regression — smooth functions with uncertainty; cubic scaling.
- ◇ Robust Regression (Huber/RANSAC/Theil–Sen) — less sensitive to outliers.
- ◇ Generalized Linear Models (Poisson/Gamma/Tweedie) — non-Gaussian targets with link functions.
- ◇ Quantile Regression — predicts conditional quantiles; good for asymmetric loss.
- ★ k-Means — centroid-based, spherical clusters; fast, needs k.
- ◇ k-Medoids (PAM) — like k-means but uses actual points; robust to outliers.
- ★ Agglomerative Hierarchical (Ward/complete/etc.) — dendrogram; no need to pre-set k.
- ★ Gaussian Mixture Models (EM) — soft probabilistic clusters; elliptical shapes.
- ★ DBSCAN — density-based; finds arbitrary shapes & noise, no k, needs ε/minPts.
- ◇ HDBSCAN — DBSCAN variant with hierarchy; less parameter sensitive.
- ◇ OPTICS — orders points by density; handles variable density.
- ◇ Spectral Clustering — graph Laplacian; good for non-convex manifolds.
- ◇ Mean Shift — mode seeking; auto-determines number via bandwidth.
- ◇ BIRCH — incremental; efficient for very large datasets.
- ◇ Self-Organizing Maps (SOM) — neural grid preserving topology.
- ★ PCA — orthogonal variance directions; linear, fast.
- ◇ Truncated SVD — PCA for sparse data (e.g., TF-IDF) without centering.
- ◇ Kernel PCA — nonlinear variant via kernels.
- ★ t-SNE — preserves local neighborhoods; great for visualization, not for downstream features.
- ★ UMAP — faster than t-SNE, preserves global/local structure; popular for viz & features.
- ◇ Isomap — geodesic distances on manifolds.
- ◇ LLE (and HLLE) — locally linear embeddings; manifold learning.
- ◇ Autoencoders (deep) — nonlinear compression via NN bottleneck.
- ◇ NMF — non-negative factors; parts-based representations (e.g., topics).
- ◇ ICA — independent components; separates mixed signals.
- ◇ Factor Analysis — latent variable model for covariance structure.
- ◇ Random Projections — Johnson–Lindenstrauss; very fast approximate reduction.
- ◇ Fisher’s LDA — supervised projection maximizing class separability.
- ★ Hold-out Split — simple train/validation/test evaluation.
- ★ k-Fold Cross-Validation (Stratified/Group/Time-Series) — robust generalization estimates.
- ◇ Grid Search — exhaustive hyperparameter scan over a grid.
- ★ Random Search — random sampling; surprisingly strong baseline.
- ◇ Bayesian Optimization (GP/TPE/SMBO) — sample-efficient hyperparameter tuning.
- ◇ Successive Halving / Hyperband / ASHA — early-stops poor configs to save compute.
- ★ Early Stopping — stop training when validation stalls (boosting/NNs).
- ◇ Information Criteria (AIC/BIC) — penalized likelihood for model order.
- ◇ Nested Cross-Validation — unbiased model selection + performance estimate.
- ◇ Bootstrap (.632+) — resampling-based error estimation, small-data friendly.
- ★ Ensembling/Stacking/Blending — combine models; often beats single best.
- ★ Scaling — Standard, Min-Max, MaxAbs, Robust; essential for distance/kernels.
- ◇ Power Transforms — Box-Cox, Yeo–Johnson; stabilize variance, Gaussianize.
- ★ Imputation — Mean/Median, KNNImputer, Iterative/MICE; fills missing values.
- ★ Categorical Encoding — One-Hot, Ordinal, Target/Mean, Leave-One-Out, Hashing.
- ★ Text Features — tokenization, n-grams, TF-IDF, subword/BPE; basic NLP pipeline.
- ◇ Feature Generation — polynomial/interactions, feature hashing, datetime decompositions.
- ★ Feature Selection — filter (chi²/F-test/MI), wrapper (RFE), embedded (L1/Lasso, tree importances).
- ◇ Outlier Handling — z-score/Winsorization, Isolation Forest, Local Outlier Factor.
- ★ Resampling (Imbalance) — stratification, class weights, SMOTE/ADASYN, undersampling.
- ◇ Time-Series Prep — differencing, rolling stats, seasonal decomposition, lag features.
- ◇ Image Prep — normalization, resizing, augmentation (flip/rotate/crop).
- ◇ Target Transforms — log/Box-Cox for skewed regression targets.
- ★ Leakage-Safe Pipelines — fit transforms only on train; use pipelines to avoid leakage.