Skip to content

Instantly share code, notes, and snippets.

@xdralex
Created April 19, 2021 13:39
Show Gist options
  • Save xdralex/133f88592b4908090164ba72a6f0718d to your computer and use it in GitHub Desktop.
Save xdralex/133f88592b4908090164ba72a6f0718d to your computer and use it in GitHub Desktop.
ML/DL/CS Checklist

Deep Learning

concepts

  • forward and backward propagation
  • vanishing gradient
  • image convolution operation
  • feature map, filter/kernel
  • receptive field
  • embedding
  • translation invariance

ideas

  • transfer learning
  • augmentation
  • semantic segmentation, instance segmentation, panoptic segmentation
  • object detection, localization

details

  • bias trick
  • weight initialization: xavier VS he (kaiming)
  • early stopping, learning rate annealing / decay
  • learning rate range test, cyclical learning rate, cosine learning rate
  • learning rate vs batch size
  • L1 vs L2, dropout regularization
  • ensembling models trained on cross-validation folds
  • online augmentation, test time augmentation
  • encoder/decoder, autoencoder
  • non maximum suppression (detection)

optimization

  • SGD
  • momentum, nesterov momentum
  • adagrad, rmsprop
  • adam, r-adam, n-adam

loss functions and metrics

  • log loss, nll loss
  • smooth L1 loss
  • triplet loss
  • softmax, softplus, softshrink, softsign
  • cross entropy loss, binary cross entropy loss, balanced cross entropy loss
  • focal loss
  • huber loss
  • hinge loss / multiclass SVM loss / max-margin loss
  • lovasz-softmax loss
  • dice coefficient, jaccard coefficient (IoU), soft dice, soft jaccard
  • average precision (AP), average recall (AR), AP11, mAP@[.5, .95], localization-recall-precision (LRP)

layers

  • fully connected (dense) layer
  • rely, leaky relu, exponential linear unit, prelu
  • dropout, maxout
  • tanh, sigmoid
  • (spatial, temporal) batch normalization layer
  • convolutional layer
  • maxpool layer, avgpool, global pooling
  • unpooling layer
  • deconvolutional layer
  • shortcut connections
  • dilated convs, depthwise separable convs, bottleneck blocks
  • group convolution

tricks

  • squeeze-and-excitation
  • warm restarts
  • learning rate warmup

classification

  • LeNet
  • AlexNet
  • ZFNet
  • VGG
  • GoogLeNet
  • ResNet, ResNeXt, ResNet SE, ResNet-D, WRN
  • Inception, Inception-ResNet
  • Xception
  • MobileNet
  • ShuffleNet

semantic segmentation

  • FCN8
  • SegNet
  • UNet
  • PSPNet
  • FPN
  • DeepLab, DeepLab v3
  • ENet
  • ICNet

object detection

  • R-CNN
  • Fast RCNN
  • Faster RCNN
  • SSD
  • YOLO, YOLO9000
  • RetinaNet
  • CenterNet

instance segmentation

  • Mask R-CNN
  • YOLACT

pose estimation

  • PoseNet
  • DensePose

gan

  • GAN
  • DCGAN
  • WGAN
  • pix2pix
  • CycleGAN

other

  • SqueezeNet
  • DetectNet

distributed training

  • model parallel
  • data parallel
  • microbatches combination
  • on-demand parameters loading
  • ZeRO optimizer state, parameters and gradient partitioning

Machine learning

concepts

  • max likelihood, entropy, cross-entropy, conditional entropy
  • mutual information
  • gini impurity, information gain, KL-divergence, variance reduction
  • stacking, bagging, boosting
  • hyperparameters vs model parameters
  • kolmogorov complexity

optimization

  • gradient descent, SGD
  • newton method, newton-raphson method, L-BFGS, quasi-netwon methids
  • non-linear conjugate gradient method
  • jacobian, hessian
  • quadratic optimization
  • line search, backtracking line search, trust region

linear algebra

  • linear independence, rank of a matrix, determinant
  • eigen decomposition, eigenvectors, eigenvalues
  • SVD, truncated SVD
  • matrix diagonalization
  • positive definite matrix

regularization

  • L1, L2 regularization
  • trees regularization

transformation

  • Box-Cox transform
  • unbounded Johnson transform
  • log/sqrt transform
  • normalization (z-scoring)
  • frequency encoding, target encoding, one-hot encoding

regression

  • linear/non-linear/segmented/local regression, ordinary least squares
  • lasso/ridge regression, elastic net
  • gradient boosting, random forest
  • coefficient of determination (R2), adjusted R2, fraction of variance unexplained
  • residuals analysis: residuals vs predictors, residuals histogram, studentized residuals
  • heteroskedasticity 
  • k nearest neighboors

classification

  • logistic regression, logit, odds, log-odds
  • SVM, kernel trick
  • contingency table, confusion matrix
  • precision, recall, specificity/selectivity, accuracy, F1, informedness
  • ROC AUC, precision-recall curve
  • multiclass classification - separating hyperplanes
  • one vs all, all vs all - multiclass classifications
  • type I and II errors
  • xgboost vs catboost vs lightgbm

clustering

  • k-means

dimensionality reduction

  • PCA, Robust PCA
  • t-SNE
  • feature selection

cross-validation

  • exhaustive cross-validation, non-exhaustive cross-validation, nested cross-validation
  • leave-p-out validation
  • k-fold validation, stratified k-fold validation
  • holdout validation method
  • repeated random subsampling validation
  • k*l-fold validation, k-fold validation with validation and test set

kernel

  • window function
  • kernel smoother
  • kernel density estimation / parzen window
  • multivariate kernel density estimation
  • uniform, epanechnikov, triangle, tricube, quartic, gaussian, quadratic, cosine kernels

algorithms

  • kd-tree, ball-tree

Probability & statistics

concepts

  • pdf, cdf, pmf
  • central moment vs raw moment
  • mean, median, mode, variance, covariance, skewness, kurtosis, moments
  • quantiles, quartiles, interquartile range (IQR), box plot
  • hypothesis testing
  • sample mean and variance
  • confidence intervals, prediction intervals
  • correlation, pearson's correlation, spearman's rank correlation, kendall correlation
  • bias-variance tradeoff
  • likelihood function, maximum likelihood estimation (MLE), maximum a posteriori estimation (MAP)
  • memoryless distribution
  • kullback-leibler divergence, entropy of a distribution
  • cross-entropy
  • convolution of probability distributions, distribution of the sum of random values
  • mixture distribution
  • probability transformation rule
  • kernel trick
  • hilbert space

distributions

  • bernoulli, binomial, multinomial, geometric, hypergeometric distributions
  • uniform, normal, laplace, exponential, poisson, chi-square, gamma, beta, student's t distributions

statistical tests

  • null hypothesis, p-value, one-tailed and two-tailed tests, statistical significance
  • type I and type II errors
  • z-score, t-statistic, z-test, t-tests
  • fisher's method / combined probability test
  • f-value, f-test, pearson's chi-squared test, fisher's exact test, kolmogorov-smirnov test
  • bessel's correction
  • bonferroni correction
  • family-wise error rate

Calculus

concepts

  • support of a function

functions

  • sign function, kronecker delta function
  • dirac delta function, heaviside step function, ramp function
  • gamma function, beta function

operations

  • convolution, circular convolution, discrete convolution
  • integral transform
  • Fourier transform
  • subderivative
  • automatic differentiation, reverse-mode autodiff
  • taylor expansion

Algorithms

concepts

  • big theta, big oh, big omega
  • master theorem
  • backtracking
  • dynamic programming
  • reductions
  • Turing machine, nondeterministic Turing machine
  • P, NP, NP-complete, NP-hard problems

sorting

  • N*logN worst case proof (compare-based)
  • sort stability
  • insertion sort
  • merge sort
  • quick sort
  • heap sort
  • count/bucket sort
  • reservoir sampling

selection and search

  • quick select
  • range search
  • interval search

graphs

  • biconnectivity, planarity, isomorphism
  • representation: set of edges, adjacency matrix, adjacency list
  • dfs, bfs
  • find shortest (directed) path
  • check (strong/directed) connectivity
  • find (strong/directed) connected components - Kosaraju-Sharir algorithm
  • check if graph is bipartite / has loops / has euler tour / has hamilton tour / is planar
  • check if graphs are isomorphic
  • topological sort
  • minimum spanning tree (MST)
  • Kruskal algorithm (using union-find)
  • Prim algorithm (lazy, eager)
  • shortest path tree (SPT)
  • Dijkstra algorithm for non-negative weights
  • topological sort for finding the SPT in edge-weighted DAGs
  • Bellman-Ford algorithm for graphs without negative cycles
  • st-flow, st-cut, maximum flow, mincut
  • Ford-Fulkerson algorithm (shortest/fattest path) for max flow

strings

  • key-indexed counting
  • LSD radix sort
  • MSD radix sort
  • 3-way string quicksort
  • key-in-context search
  • longest repeating substring search
  • Manber-Myers suffix array sort
  • Knuth-Morris-Pratt algorithm (deterministic finite state machine DFA + stream search)
  • Boyer-Moore algorithm (skip table)
  • Rabin-Karp algorithm (fingerprint search)
  • regex <=> DFA duality (Kleene theoreme), exponentiality of DFA space
  • regex => NFA; NFA simulation, NFA construction
  • RLE, Huffman compression, LZW compression

Data Structures

  • disjoint-set (union-find)
  • stack (linkedlist/array), queue (array/linkedlist), deque
  • heap (binary heap, d-ary heap, fibonacci heap?, binomial heap?)
  • bst (2-3 tree, red-black tree, b-tree)
  • skip list, segment tree, dekart tree
  • 2d space tree (grid, 2d tree, quadtree, bsp tree, kd-tree, interval search tree)
  • hash table (separate chaining, linear probing, two-probe chaining, cuckoo hashing)
  • set (hash table, bst)
  • suffix array, suffix tree
  • fenwick tree
  • r-way trie, ternary search trie (TST), TST with r^2-branching at root, patricia trie/radix tree

additional

  • Floyd algoritthm
  • Kadan algorithm
  • Brent algorithm
@sueszli
Copy link

sueszli commented Nov 13, 2024

Systems

Parallel processing models

  • data parallelism vs model parallelism in distributed training
  • synchronous vs asynchronous distributed learning
  • parameter server architecture
  • allreduce communication pattern
  • mapreduce distributed computing paradigm

Resource management

  • load balancing and resource allocation
  • dynamic resource scheduling
  • workload distribution strategies
  • gpu memory management and scheduling
  • batch processing systems

Infra

  • compute clusters and nodes
  • high-speed interconnects and networking
  • gpu acceleration and hybrid computing
  • distributed storage systems
  • memory hierarchies (ram, nvram, caches)

Perf

  • scalability and throughput optimization
  • fault tolerance and resilience
  • i/o contention handling
  • network congestion management
  • performance profiling and benchmarking
  • memory access optimization
  • communication overhead reduction
  • gradient synchronization strategies
  • pipeline parallelism
  • workload partitioning

Frameworks

  • distributed training frameworks
  • cuda programming model
  • mpi (message passing interface)
  • unified memory architectures
  • containerization and orchestration

@sueszli
Copy link

sueszli commented Nov 13, 2024

(really nice list, thank you!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment