Legend: ★ = must-know, ◇ = nice-to-have.

Classification

★ Logistic Regression — linear decision boundary; well-calibrated probs, fast baseline.
◇ k-Nearest Neighbors — vote of nearby points; non-parametric, can be slow at scale.
★ Naive Bayes (Gaussian/Multinomial) — assumes feature independence; great for text.
★ Support Vector Machines (SVM) — max-margin hyperplane; kernels handle nonlinearity.
★ Decision Tree — if-then rules; interpretable, prone to overfit without pruning.
★ Random Forest — bagged trees; strong default, robust to noise, handles mixed types.
★ Gradient Boosting (XGBoost/LightGBM/CatBoost) — boosted trees; state-of-the-art tabular.
★ Neural Networks (MLP/CNN/RNN/Transformers) — flexible function approximators; needs tuning.
◇ Linear/Quadratic Discriminant Analysis (LDA/QDA) — generative boundaries; quick, simple.
◇ Gaussian Process Classification — probabilistic with uncertainty; expensive for large n.
◇ RuleFit / RIPPER — sparse rule sets; interpretable alternatives to trees.

Regression

★ Ordinary Least Squares — linear relationship with MSE objective; fast baseline.
★ Ridge / Lasso / Elastic Net — linear with L2/L1/combined penalties; combats multicollinearity & selects features.
◇ Polynomial Regression — linear in expanded features; captures simple curvature.
◇ Support Vector Regression (SVR) — ε-insensitive loss; robust to outliers, slower at scale.
◇ Decision Tree Regressor — piecewise constant; interpretable, can overfit.
★ Random Forest Regressor — averages trees; strong, low tuning.
★ Gradient Boosted Trees (XGB/LGBM/CatBoost) — powerful on tabular, handles nonlinearity.
◇ kNN Regressor — local averaging; non-parametric, sensitive to scale.
◇ Gaussian Process Regression — smooth functions with uncertainty; cubic scaling.
◇ Robust Regression (Huber/RANSAC/Theil–Sen) — less sensitive to outliers.
◇ Generalized Linear Models (Poisson/Gamma/Tweedie) — non-Gaussian targets with link functions.
◇ Quantile Regression — predicts conditional quantiles; good for asymmetric loss.

★ k-Means — centroid-based, spherical clusters; fast, needs k.
◇ k-Medoids (PAM) — like k-means but uses actual points; robust to outliers.
★ Agglomerative Hierarchical (Ward/complete/etc.) — dendrogram; no need to pre-set k.
★ Gaussian Mixture Models (EM) — soft probabilistic clusters; elliptical shapes.
★ DBSCAN — density-based; finds arbitrary shapes & noise, no k, needs ε/minPts.
◇ HDBSCAN — DBSCAN variant with hierarchy; less parameter sensitive.
◇ OPTICS — orders points by density; handles variable density.
◇ Spectral Clustering — graph Laplacian; good for non-convex manifolds.
◇ Mean Shift — mode seeking; auto-determines number via bandwidth.
◇ BIRCH — incremental; efficient for very large datasets.
◇ Self-Organizing Maps (SOM) — neural grid preserving topology.

★ PCA — orthogonal variance directions; linear, fast.
◇ Truncated SVD — PCA for sparse data (e.g., TF-IDF) without centering.
◇ Kernel PCA — nonlinear variant via kernels.
★ t-SNE — preserves local neighborhoods; great for visualization, not for downstream features.
★ UMAP — faster than t-SNE, preserves global/local structure; popular for viz & features.
◇ Isomap — geodesic distances on manifolds.
◇ LLE (and HLLE) — locally linear embeddings; manifold learning.
◇ Autoencoders (deep) — nonlinear compression via NN bottleneck.
◇ NMF — non-negative factors; parts-based representations (e.g., topics).
◇ ICA — independent components; separates mixed signals.
◇ Factor Analysis — latent variable model for covariance structure.
◇ Random Projections — Johnson–Lindenstrauss; very fast approximate reduction.
◇ Fisher’s LDA — supervised projection maximizing class separability.

★ Hold-out Split — simple train/validation/test evaluation.
★ k-Fold Cross-Validation (Stratified/Group/Time-Series) — robust generalization estimates.
◇ Grid Search — exhaustive hyperparameter scan over a grid.
★ Random Search — random sampling; surprisingly strong baseline.
◇ Bayesian Optimization (GP/TPE/SMBO) — sample-efficient hyperparameter tuning.
◇ Successive Halving / Hyperband / ASHA — early-stops poor configs to save compute.
★ Early Stopping — stop training when validation stalls (boosting/NNs).
◇ Information Criteria (AIC/BIC) — penalized likelihood for model order.
◇ Nested Cross-Validation — unbiased model selection + performance estimate.
◇ Bootstrap (.632+) — resampling-based error estimation, small-data friendly.
★ Ensembling/Stacking/Blending — combine models; often beats single best.

★ Scaling — Standard, Min-Max, MaxAbs, Robust; essential for distance/kernels.
◇ Power Transforms — Box-Cox, Yeo–Johnson; stabilize variance, Gaussianize.
★ Imputation — Mean/Median, KNNImputer, Iterative/MICE; fills missing values.
★ Categorical Encoding — One-Hot, Ordinal, Target/Mean, Leave-One-Out, Hashing.
★ Text Features — tokenization, n-grams, TF-IDF, subword/BPE; basic NLP pipeline.
◇ Feature Generation — polynomial/interactions, feature hashing, datetime decompositions.
★ Feature Selection — filter (chi²/F-test/MI), wrapper (RFE), embedded (L1/Lasso, tree importances).
◇ Outlier Handling — z-score/Winsorization, Isolation Forest, Local Outlier Factor.
★ Resampling (Imbalance) — stratification, class weights, SMOTE/ADASYN, undersampling.
◇ Time-Series Prep — differencing, rolling stats, seasonal decomposition, lag features.
◇ Image Prep — normalization, resizing, augmentation (flip/rotate/crop).
◇ Target Transforms — log/Box-Cox for skewed regression targets.
★ Leakage-Safe Pipelines — fit transforms only on train; use pipelines to avoid leakage.