Skip to content

Instantly share code, notes, and snippets.

@lavantien
Last active January 26, 2025 15:02
Show Gist options
  • Save lavantien/a148ac11ee7ce2939f011982f907983e to your computer and use it in GitHub Desktop.
Save lavantien/a148ac11ee7ce2939f011982f907983e to your computer and use it in GitHub Desktop.
Optimization Algorithms Overview

Applications Overview


1. Gradient Descent (GD) & Variants (SGD, Adam)

  • Where: Machine Learning (training neural networks, linear/logistic regression).
  • Why:
    • Simplicity: Easy to implement for differentiable objectives.
    • Scalability: Stochastic variants (SGD) handle large datasets.
    • Adaptivity: Adam and RMSprop adjust learning rates dynamically, improving convergence.
  • Example: Training deep learning models (e.g., ResNet, GPT).

2. Genetic Algorithms (GA)

  • Where: Engineering design, combinatorial optimization (e.g., aerospace, logistics).
  • Why:
    • Global Search: Avoids local optima in non-convex/non-differentiable problems.
    • Flexibility: Works with discrete/continuous variables.
  • Example: Optimizing aircraft wing shapes or factory layouts.

3. Simplex Method (Linear Programming)

  • Where: Operations Research (supply chain, resource allocation).
  • Why:
    • Efficiency: Solves linear problems with thousands of variables.
    • Robustness: Well-understood for real-world LP problems.
  • Example: Maximizing profit in manufacturing resource allocation.

4. Particle Swarm Optimization (PSO)

  • Where: Control systems, robotics, antenna design.
  • Why:
    • Swarm Intelligence: Balances exploration and exploitation without gradients.
    • Parallelism: Easy to parallelize for distributed systems.
  • Example: Tuning PID controllers in industrial automation.

5. Interior-Point Methods

  • Where: Convex optimization (finance, engineering).
  • Why:
    • Speed: Polynomial-time convergence for large convex problems.
    • Accuracy: Handles ill-conditioned matrices better than Simplex.
  • Example: Portfolio optimization (Markowitz model).

6. Bayesian Optimization

  • Where: Hyperparameter tuning (ML/AI), drug discovery.
  • Why:
    • Sample Efficiency: Minimizes expensive evaluations (e.g., training large models).
    • No Gradients: Ideal for black-box functions.
  • Example: Optimizing learning rates/batch sizes for deep learning.

7. Simulated Annealing (SA)

  • Where: VLSI design, scheduling, materials science.
  • Why:
    • Escapes Local Optima: Probabilistic acceptance of worse solutions.
    • Simplicity: Easy to implement for combinatorial problems.
  • Example: Optimizing chip layouts to minimize heat dissipation.

8. L-BFGS (Limited-Memory BFGS)

  • Where: Medium-scale ML models (e.g., logistic regression, SVMs).
  • Why:
    • Memory Efficiency: Approximates Hessian without storing large matrices.
    • Speed: Faster convergence than vanilla GD for smooth functions.
  • Example: Training medium-sized neural networks.

9. NSGA-II (Multi-Objective)

  • Where: Engineering design, economics, environmental planning.
  • Why:
    • Pareto Optimality: Maintains diverse solutions on the trade-off front.
    • Scalability: Handles multiple conflicting objectives.
  • Example: Designing cars for cost, safety, and fuel efficiency.

10. Alternating Direction Method of Multipliers (ADMM)

  • Where: Distributed systems, image processing, finance.
  • Why:
    • Decentralization: Splits problems into subproblems for parallel solving.
    • Convergence: Works for non-smooth/non-convex problems.
  • Example: Distributed training of federated learning models.

Why These Algorithms Dominate

  1. Problem-Specific Strengths:

    • Gradient-based methods (GD, Adam) dominate ML due to automatic differentiation frameworks (PyTorch, TensorFlow).
    • Metaheuristics (GA, PSO) excel in non-differentiable/combinatorial spaces.
    • Interior-point methods solve large convex problems faster than Simplex.
  2. Scalability:

    • SGD and Adam handle big data via mini-batches.
    • Bayesian Optimization minimizes costly evaluations (e.g., lab experiments).
  3. Ease of Use:

    • Libraries like scipy.optimize and Optuna abstract complexity.
  4. Theoretical Guarantees:

    • Convex methods (Interior-Point, ADMM) have convergence proofs.

Emerging Trends

  • Quantum Optimization: QAOA for finance/chemistry (still experimental).
  • AutoML: Algorithm selection via meta-learning (e.g., Google’s Vizier).
  • Differentiable Optimization: Integrating solvers into neural networks (e.g., CVXPYLayer).

Visualization


1. Gradient Descent

graph TD
  A[Initialize Parameters] --> B[Compute Gradient]
  B --> C[Update Parameters]
  C --> D{Converged?}
  D -->|No| B
  D -->|Yes| E[Return Parameters]
Loading

2. Genetic Algorithm

graph TD
  A[Initialize Population] --> B[Evaluate Fitness]
  B --> C[Select Parents]
  C --> D[Crossover]
  D --> E[Mutation]
  E --> F[Replace Population]
  F --> G{Max Generations?}
  G -->|No| B
  G -->|Yes| H[Return Best]
Loading

3. Simplex Method (Linear Programming)

graph TD
  A[Initialize Tableau] --> B[Select Pivot Column]
  B --> C[Select Pivot Row]
  C --> D[Perform Pivot]
  D --> E{Optimal?}
  E -->|No| B
  E -->|Yes| F[Return Solution]
Loading

4. Particle Swarm Optimization (PSO)

graph TD
  A[Initialize Particles] --> B[Update Velocities]
  B --> C[Update Positions]
  C --> D[Evaluate Fitness]
  D --> E[Update Best]
  E --> F{Stopping Condition?}
  F -->|No| B
  F -->|Yes| G[Return Best]
Loading

5. Adam Optimizer

graph TD
  A[Initialize m, v] --> B[Compute Gradient]
  B --> C[Update m]
  C --> D[Update v]
  D --> E[Bias Correction]
  E --> F[Update Parameters]
  F --> G{Converged?}
  G -->|No| B
  G -->|Yes| H[Return Parameters]
Loading

6. Simulated Annealing

graph TD
  A[Initialize T, Solution] --> B[Perturb Solution]
  B --> C[Calculate ΔE]
  C --> D{Accept?}
  D -->|Yes| E[Update Solution]
  D -->|No| F[Reject]
  E --> G[Cool T]
  F --> G
  G --> H{T > T_min?}
  H -->|Yes| B
  H -->|No| I[Return Best]
Loading

7. Ant Colony Optimization (ACO)

graph TD
  A[Initialize Pheromones] --> B[Ants Build Paths]
  B --> C[Update Pheromones]
  C --> D{Stopping Condition?}
  D -->|No| B
  D -->|Yes| E[Return Best Path]
Loading

8. Bayesian Optimization

graph TD
  A[Initialize Model] --> B[Select New Point]
  B --> C[Evaluate Function]
  C --> D[Update Model]
  D --> E{Max Iterations?}
  E -->|No| B
  E -->|Yes| F[Return Best]
Loading

9. NSGA-II (Multi-Objective)

graph TD
  A[Initialize Population] --> B[Non-Dominated Sort]
  B --> C[Crowding Distance]
  C --> D[Select Top]
  D --> E[Crossover/Mutation]
  E --> F{Generations < Max?}
  F -->|Yes| B
  F -->|No| G[Return Pareto Front]
Loading

10. ADMM

graph TD
  A[Initialize x, z, y] --> B[Update x]
  B --> C[Update z]
  C --> D[Update y]
  D --> E{Converged?}
  E -->|No| B
  E -->|Yes| F[Return x, z]
Loading

11. Levenberg-Marquardt

graph TD
  A[Initialize] --> B[Compute Residual]
  B --> C[Solve Equation]
  C --> D[Update Parameters]
  D --> E{Error Reduced?}
  E -->|Yes| F[Adjust λ]
  E -->|No| G[Adjust λ]
  F --> H{Converged?}
  G --> H
  H -->|No| B
  H -->|Yes| I[Return Parameters]
Loading

12. Nelder-Mead (Simplex)

graph TD
  A[Initialize Simplex] --> B[Sort Points]
  B --> C[Compute Centroid]
  C --> D[Reflect]
  D --> E{Improved?}
  E -->|Yes| F[Expand]
  E -->|No| G[Contract]
  F --> H[Replace]
  G --> H
  H --> I{Converged?}
  I -->|No| B
  I -->|Yes| J[Return Best]
Loading

13. QAOA (Quantum Optimization)

graph TD
  A[Initialize Parameters] --> B[Prepare State]
  B --> C[Measure Energy]
  C --> D[Optimize Parameters]
  D --> E{Converged?}
  E -->|No| B
  E -->|Yes| F[Return Parameters]
Loading

14. Branch and Bound

graph TD
  A[Initialize Queue] --> B[Dequeue Node]
  B --> C{Node Valid?}
  C -->|Yes| D[Branch]
  C -->|No| E[Prune]
  D --> F[Enqueue Children]
  F --> B
  E --> B
  B --> G{Queue Empty?}
  G -->|Yes| H[Return Best]
  G -->|No| B
Loading

15. Differential Evolution

graph TD
  A[Initialize Population] --> B[For Each Vector]
  B --> C[Select Parents]
  C --> D[Generate Mutant]
  D --> E[Crossover]
  E --> F{Better?}
  F -->|Yes| G[Replace]
  F -->|No| H[Keep]
  G --> I{All Done?}
  H --> I
  I -->|No| B
  I -->|Yes| J{Converged?}
  J -->|No| B
  J -->|Yes| K[Return Best]
Loading

Theoretical and Implementation


1. Classical Optimization Algorithms

Gradient-Based Methods

  1. Gradient Descent

    • Theory: Iteratively updates parameters in the direction of the negative gradient to minimize a function.
    • Example: Training linear/logistic regression models.
    • Pseudo-Code:
      Initialize θ, learning rate α  
      while not converged:  
          Compute gradientJ(θ)  
          θ = θ - α *J(θ)  
  2. Conjugate Gradient

    • Theory: Solves linear systems by generating conjugate (orthogonal) search directions to accelerate convergence.
    • Example: Solving large sparse systems (e.g., finite element analysis).
    • Pseudo-Code:
      Initialize θ, residual r = b - , direction d = r  
      while ||r|| > tolerance:  
          α = (rᵀr) / (dᵀAd)  
          θ = θ + α*d  
          r_new = r - α*A*d  
          β = (r_newᵀr_new) / (rᵀr)  
          d = r_new + β*d  
          r = r_new  
  3. Newton’s Method

    • Theory: Uses second derivatives (Hessian) for faster convergence near minima.
    • Example: Portfolio optimization in finance.
    • Pseudo-Code:
      Initialize θ  
      while not converged:  
          Compute gradientJ(θ), Hessian H  
          θ = θ - H⁻¹∇J(θ)  
  4. Quasi-Newton (BFGS, L-BFGS)

    • Theory: Approximates the Hessian using gradient updates to avoid direct computation.
    • Example: Training support vector machines (SVMs).
    • Pseudo-Code (BFGS):
      Initialize θ, B = I (identity matrix)  
      while not converged:  
          Compute direction d = -BJ(θ)  
          Perform line search for step size α  
          s = α*d  
          y =J(θ + s) -J(θ)  
          B = B + (y yᵀ)/(yᵀs) - (B s sᵀ B)/(sᵀ B s)  
          θ = θ + s  
  5. Levenberg-Marquardt

    • Theory: Hybrid of gradient descent and Gauss-Newton for nonlinear least squares.
    • Example: Calibrating camera parameters in computer vision.
    • Pseudo-Code:
      Initialize θ, λ = 0.001  
      while not converged:  
          Compute residual r, Jacobian J  
          Δθ = (JᵀJ + λI)⁻¹ Jᵀr  
          if J(θ + Δθ) < J(θ):  
              θ = θ + Δθ  
              λ = λ/10  
          else:  
              λ = λ*10  

Derivative-Free Methods

  1. Nelder-Mead (Simplex)

    • Theory: Adjusts a simplex through reflection, expansion, and contraction.
    • Example: Tuning hyperparameters for industrial simulations.
    • Pseudo-Code:
      Initialize simplex with n+1 points  
      while not converged:  
          Sort points by f(x)  
          Compute centroid (excluding worst)  
          Reflect worst point over centroid  
          if f(reflected) < best: Expand  
          elif f(reflected) < worst: Replace worst  
          else: Contract or shrink simplex  
  2. Pattern Search

    • Theory: Explores points in a predefined geometric pattern (e.g., grid).
    • Example: Optimizing aircraft wing shapes.
    • Pseudo-Code:
      Initialize θ, step size Δ  
      while Δ > tolerance:  
          Evaluate f(θ ± Δ in all directions)  
          if improvement found:  
              Move θ to better point  
              Δ = 1.5*Δ  # Expand step  
          else:  
              Δ = 0.5*Δ  # Shrink step  

Linear Programming

  1. Simplex Method

    • Theory: Moves along vertices of the feasible region to find the optimal solution.
    • Example: Maximizing profit in manufacturing resource allocation.
    • Pseudo-Code:
      Convert to standard form (Ax = b, x0)  
      while pivot exists:  
          Select entering variable (most negative coefficient)  
          Select leaving variable (min ratio test)  
          Pivot to update tableau  
  2. Interior-Point Methods

    • Theory: Traverses the interior of the feasible region using barrier functions.
    • Example: Solving large-scale logistics optimization.
    • Pseudo-Code:
      Initialize x > 0, μ > 0, tolerance  
      while duality gap > tolerance:  
          Solve Newton step for barrier problem  
          Update x, μ = σμ (σ ∈ (0,1))  

Nonlinear Programming

  1. Trust Region Methods

    • Theory: Approximates the objective locally with a model (e.g., quadratic) within a trusted region.
    • Example: Fitting nonlinear curves to sensor data.
    • Pseudo-Code:
      Initialize θ, trust region radius Δ  
      while not converged:  
          Solve subproblem: min model m_k(θ + s) within ||s||Δ  
          Compute actual/predicted reduction ratio ρ  
          if ρ > 0.75: Δ = 2Δ  # Expand region  
          elif ρ < 0.25: Δ = 0.5Δ  # Shrink region  
          Update θ  
  2. Sequential Quadratic Programming (SQP)

    • Theory: Solves constrained problems via a sequence of quadratic subproblems.
    • Example: Trajectory optimization for rockets.
    • Pseudo-Code:
      Initialize θ, Lagrange multipliers λ  
      while not converged:  
          Compute Hessian H, gradientf, Jacobian J  
          Solve QP: min 0.5sᵀHs +fᵀs s.t. Js + c = 0  
          Update θ = θ + s, λ = new multipliers  

2. Metaheuristics (Global Optimization)

Evolutionary Algorithms

  1. Genetic Algorithm (GA)

    • Theory: Evolves solutions using selection, crossover, and mutation.
    • Example: Optimizing wind farm layouts.
    • Pseudo-Code:
      Initialize population  
      while generations < max_gen:  
          Evaluate fitness  
          Select parents via tournament/roulette  
          Crossover parents to create offspring  
          Mutate offspring  
          Replace worst individuals  
  2. Differential Evolution (DE)

    • Theory: Generates mutants by adding scaled differences between population vectors.
    • Example: Optimizing drug discovery pipelines.
    • Pseudo-Code:
      Initialize population  
      for each individual x_i:  
          Select x_a, x_b, x_cx_i  
          mutant = x_a + F*(x_b - x_c)  
          trial = crossover(x_i, mutant)  
          if f(trial) < f(x_i): replace x_i  
  3. Evolution Strategies (ES)

    • Theory: Adapts mutation strength based on success history.
    • Example: Training robotic control policies.
    • Pseudo-Code:
      Initialize θ, σ (mutation strength)  
      while not converged:  
          Generate offspring by θ + σ*N(0, I)  
          Evaluate fitness  
          Update θ and σ based on top offspring  

Swarm Intelligence

  1. Particle Swarm Optimization (PSO)

    • Theory: Particles adjust trajectories based on personal and global best positions.
    • Example: Optimizing antenna array designs.
    • Pseudo-Code:
      Initialize particles with positions and velocities  
      while not converged:  
          for each particle:  
              v = w*v + c1*rand()*(pbest - x) + c2*rand()*(gbest - x)  
              x = x + v  
              Update pbest and gbest  
  2. Ant Colony Optimization (ACO)

    • Theory: Ants probabilistically build paths using pheromone trails.
    • Example: Optimizing traffic light cycles.
    • Pseudo-Code:
      Initialize pheromone trails τ  
      while not converged:  
          for each ant:  
              Build path via τ probabilities  
          Update τ (evaporate + deposit based on path quality)  
  3. Artificial Bee Colony (ABC)

    • Theory: Simulates bees (employed, onlookers, scouts) foraging for optimal solutions.
    • Example: Task scheduling in cloud computing.
    • Pseudo-Code:
      Initialize food sources (solutions)  
      while cycles < max_cycles:  
          Employed bees: Explore nearby solutions  
          Onlooker bees: Select solutions probabilistically  
          Scout bees: Replace abandoned solutions  

Physics/Chemistry-Inspired

  1. Simulated Annealing (SA)

    • Theory: Accepts worse solutions early to escape local optima, mimicking metal cooling.
    • Example: VLSI chip placement.
    • Pseudo-Code:
      Initialize T (temperature), current_solution  
      while T > T_min:  
          new_solution = perturb(current_solution)  
          ΔE = f(new) - f(current)  
          if ΔE < 0 or rand() < exp(-ΔE/T):  
              current_solution = new_solution  
          T = cool(T)  
  2. Harmony Search

    • Theory: Mimics musicians improvising to find harmonies (solutions).
    • Example: Optimizing water distribution networks.
    • Pseudo-Code:
      Initialize harmony memory (HM)  
      while not converged:  
          Improvise new harmony via HM or randomness  
          if new harmony better than worst in HM:  
              Replace worst in HM  
  3. Gravitational Search

    • Theory: Masses (solutions) attract each other based on gravitational force.
    • Example: Image segmentation.
    • Pseudo-Code:
      Initialize masses with positions  
      while not converged:  
          Calculate gravitational forces between masses  
          Update velocities and positions  
          Evaluate fitness and update masses  

Hybrids

  1. Memetic Algorithms

    • Theory: Combines GA with local search for refinement.
    • Example: Optimizing cancer radiotherapy plans.
    • Pseudo-Code:
      Initialize population  
      while generations < max_gen:  
          Perform GA operations (crossover/mutation)  
          Apply local search to top individuals  
  2. Hyper-Heuristics

    • Theory: Selects or generates heuristics dynamically.
    • Example: Automated algorithm selection for SAT solvers.
    • Pseudo-Code:
      Initialize set of low-level heuristics  
      while not converged:  
          Select heuristic based on performance history  
          Apply heuristic to current solution  

3. Stochastic Optimization

  1. Stochastic Gradient Descent (SGD)

    • Theory: Uses random mini-batches to approximate gradients.
    • Example: Training deep neural networks.
    • Pseudo-Code:
      Initialize θ, α  
      for epoch in 1 to max_epochs:  
          Shuffle data  
          for each mini-batch:  
              θ = θ - α*J(θ, mini-batch)  
  2. Randomized Algorithms

    • Theory: Uses randomness to solve deterministic problems efficiently.
    • Example: Randomized SVD for large matrices.
    • Pseudo-Code:
      Generate random matrix Ω  
      Compute sketch Y =   
      Orthonormalize Y to get Q  
      Compute B = QᵀA  
      Perform SVD on B  
  3. Stochastic Approximation

    • Theory: Iteratively estimates solutions with noisy observations.
    • Example: Reinforcement learning (Q-learning).
    • Pseudo-Code:
      Initialize θ  
      for t = 1, 2, ...:  
          Observe noisy gradient g_t  
          θ = θ - α_t * g_t  

4. Convex Optimization

  1. Interior-Point Methods

    • Theory: Uses barrier functions to handle constraints.
    • Example: Portfolio optimization with risk constraints.
    • Pseudo-Code:
      Initialize x, μ > 0  
      while μ > tolerance:  
          Solve min f(x) - μlog(c_i(x))  
          μ = σμ (σ ∈ (0,1))  
  2. Subgradient Methods

    • Theory: Generalizes gradient descent for non-differentiable functions.
    • Example: Training SVMs with hinge loss.
    • Pseudo-Code:
      Initialize θ, α  
      while not converged:  
          Compute subgradient g  
          θ = θ - α*g  
  3. Proximal Algorithms (ADMM)

    • Theory: Splits problems into subproblems with proximal operators.
    • Example: Distributed matrix factorization.
    • Pseudo-Code:
      Initialize x, z, y  
      while not converged:  
          x = argmin_x [f(x) + (ρ/2)||x - z + y||²]  
          z = argmin_z [g(z) + (ρ/2)||x - z + y||²]  
          y = y + (x - z)  

5. Constrained Optimization

  1. Penalty Methods

    • Theory: Adds penalty terms to the objective for constraint violations.
    • Example: Designing structures with material limits.
    • Pseudo-Code:
      Initialize θ, μ = 1  
      while not converged:  
          Solve min f(θ) + μ*penalty(c(θ))  
          μ = 10*μ  
  2. Barrier Methods

    • Theory: Uses logarithmic barriers to enforce constraints.
    • Example: Optimal control in robotics.
    • Pseudo-Code:
      Initialize θ, μ = 1  
      while μ > tolerance:  
          Solve min f(θ) - μlog(-c_i(θ))  
          μ = 0.1*μ  
  3. Augmented Lagrangian

    • Theory: Combines Lagrangian multipliers with penalty terms.
    • Example: Economic dispatch in power systems.
    • Pseudo-Code:
      Initialize λ, μ  
      while not converged:  
          Solve min f(x) + λᵀc(x) + (μ/2)||c(x)||²  
          λ = λ + μ*c(x)  
          μ = increase_if_needed  

6. Combinatorial Optimization

  1. Branch and Bound

    • Theory: Explores solution space via tree traversal with pruning.
    • Example: Solving the knapsack problem.
    • Pseudo-Code:
      Initialize queue with root node, best =while queue not empty:  
          node = dequeue()  
          if node.cost >= best: prune  
          if node is feasible: update best  
          else: branch into child nodes  
  2. Dynamic Programming

    • Theory: Reuses solutions to overlapping subproblems.
    • Example: RNA secondary structure prediction.
    • Pseudo-Code:
      Initialize memo table  
      for i in 1 to n:  
          for j in 1 to m:  
              dp[i][j] = min(dp[i-1][j], dp[i][j-1], ...)  
  3. Greedy Algorithms

    • Theory: Makes locally optimal choices at each step.
    • Example: Huffman coding for data compression.
    • Pseudo-Code:
      Sort items by some criterion  
      while solution not complete:  
          Select next best item  
          if feasible: add to solution  
  4. Tabu Search

    • Theory: Avoids revisiting solutions using a tabu list.
    • Example: Vehicle routing problems.
    • Pseudo-Code:
      Initialize current_solution, tabu_list  
      while not converged:  
          Generate neighbors not in tabu_list  
          Select best neighbor  
          Update current_solution and tabu_list  

7. Multi-Objective Optimization

  1. NSGA-II

    • Theory: Uses non-dominated sorting and crowding distance for Pareto fronts.
    • Example: Designing eco-friendly buildings.
    • Pseudo-Code:
      Initialize population  
      while generations < max_gen:  
          Create offspring via crossover/mutation  
          Combine parents + offspring  
          Rank by non-domination and crowding  
          Select top N individuals  
  2. MOEA/D

    • Theory: Decomposes multi-objective problems into scalar subproblems.
    • Example: Optimizing turbine efficiency and cost.
    • Pseudo-Code:
      Initialize weight vectors  
      for each subproblem:  
          Solve using neighboring subproblems  
          Update solutions and neighborhoods  
  3. Pareto Front Methods

    • Theory: Directly approximates the Pareto optimal set.
    • Example: Aerospace design trade-offs.
    • Pseudo-Code:
      Initialize population  
      while not converged:  
          Update solutions using dominance criteria  
          Maintain archive of non-dominated solutions  

8. Derivative-Free Optimization

  1. DIRECT

    • Theory: Divides search space into hyper-rectangles for global optimization.
    • Example: Tuning hyperparameters for climate models.
    • Pseudo-Code:
      Initialize hyper-rectangles  
      while budget not exhausted:  
          Identify potentially optimal rectangles  
          Split and sample new points  
  2. Bayesian Optimization

    • Theory: Uses probabilistic models (e.g., Gaussian processes) to guide search.
    • Example: Optimizing machine learning hyperparameters.
    • Pseudo-Code:
      Initialize surrogate model  
      for i in 1 to max_iter:  
          Select x* maximizing acquisition function  
          Evaluate f(x*)  
          Update surrogate model  
  3. Response Surface Methods

    • Theory: Fits a surrogate model (e.g., polynomial) to approximate the objective.
    • Example: Optimizing chemical reaction conditions.
    • Pseudo-Code:
      Design initial experiments (e.g., Latin hypercube)  
      while not converged:  
          Fit surrogate model to data  
          Propose new points via model predictions  

9. Machine Learning-Specific

  1. Adam

    • Theory: Combines momentum and adaptive learning rates.
    • Example: Training transformers for NLP.
    • Pseudo-Code:
      Initialize m=0, v=0, β1=0.9, β2=0.999  
      for t in 1 to max_iter:  
          m = β1*m + (1-β1)*g  
          v = β2*v + (1-β2)*g²  
          m_hat = m / (1-β1^t)  
          v_hat = v / (1-β2^t)  
          θ = θ - α*m_hat / (sqrt(v_hat) + ε)  
  2. RMSprop

    • Theory: Adapts learning rates using moving average of squared gradients.
    • Example: Training recurrent neural networks (RNNs).
    • Pseudo-Code:
      Initialize cache=0, decay_rate=0.9  
      while not converged:  
          cache = decay_rate*cache + (1-decay_rate)*g²  
          θ = θ - α*g / (sqrt(cache) + ε)  
  3. Adagrad

    • Theory: Adapts learning rates per-parameter based on historical gradients.
    • Example: Sparse data problems (e.g., click-through rate prediction).
    • Pseudo-Code:
      Initialize G=0  
      while not converged:  
          G = G + g²  
          θ = θ - α*g / (sqrt(G) + ε)  
  4. AdaDelta

    • Theory: Improves Adagrad by limiting historical gradient accumulation.
    • Example: Training deep autoencoders.
    • Pseudo-Code:
      Initialize E[g²]=0, E[Δθ²]=0  
      while not converged:  
          E[g²] = ρ*E[g²] + (1-ρ)*g²  
          Δθ = - (sqrt(E[Δθ²] + ε) / sqrt(E[g²] + ε)) * g  
          θ = θ + Δθ  
          E[Δθ²] = ρ*E[Δθ²] + (1-ρ)*Δθ²  
  5. Nesterov Accelerated Gradient

    • Theory: "Looks ahead" to adjust momentum for faster convergence.
    • Example: Training large-scale convolutional networks.
    • Pseudo-Code:
      Initialize v=0, α, μ  
      while not converged:  
          v_prev = v  
          v = μ*v - α*J(θ + μ*v)  
          θ = θ + v  

10. Quantum Optimization

  1. Quantum Annealing

    • Theory: Exploits quantum tunneling to find global minima of Ising models.
    • Example: Solving protein folding problems.
    • Pseudo-Code (Conceptual):
      Encode problem into Hamiltonian H  
      Prepare initial state |ψ>  
      Adiabatically evolve H from Hto H  
      Measure final state for solution  
  2. QAOA

    • Theory: Approximates solutions via parameterized quantum circuits.
    • Example: Portfolio optimization.
    • Pseudo-Code:
      Initialize parameters γ, β  
      for iterations:  
          Prepare |ψ(γ, β)> via quantum circuit  
          MeasureHClassically optimize γ, β to minimizeH

11. Specialized Methods

  1. Quadratic Programming

    • Theory: Optimizes quadratic objectives with linear constraints.
    • Example: Portfolio optimization (Markowitz model).
    • Pseudo-Code:
      Convert to standard form: min 0.5xᵀQx + cᵀx s.t. Axb  
      Solve via active-set or interior-point methods  
  2. Semidefinite Programming

    • Theory: Optimizes over symmetric positive semidefinite matrices.
    • Example: Sensor network localization.
    • Pseudo-Code:
      Convert to standard form: min CX s.t. A_iX = b_i, X0  
      Solve via interior-point methods  
  3. Interval Methods

    • Theory: Uses interval arithmetic to bound solutions rigorously.
    • Example: Verified global optimization in robotics.
    • Pseudo-Code:
      Initialize interval X  
      while width(X) > tolerance:  
          Subdivide X into subintervals  
          Prune subintervals violating constraints  

Five Most Important Algorithms

Below is a comprehensive breakdown of the five optimization algorithms, including theoretical foundations, real-world examples, and concurrent Go implementations. I'll focus on clarity, correctness, and parallelism.


1. Genetic Algorithm (GA)

Theory

  • Inspired by natural selection and genetics
  • Key Steps:
    1. Population: Initialize candidate solutions
    2. Fitness: Evaluate solution quality
    3. Selection: Choose parents (e.g., tournament selection)
    4. Crossover: Combine parent genes
    5. Mutation: Randomly alter genes
    6. Replacement: Create new generation

Examples

  1. Logistics Routing: Optimize delivery routes for fuel efficiency
  2. Neural Architecture Search: Evolve optimal neural network structures
  3. Drug Discovery: Find molecular configurations with desired properties

Concurrent Go Implementation

package main

import (
	"math/rand"
	"sync"
	"time"
)

type Individual struct {
	Genes []float64
	Fitness float64
}

func evaluate(individual *Individual, wg *sync.WaitGroup) {
	defer wg.Done()
	// Example: Minimize sum of squares
	sum := 0.0
	for _, gene := range individual.Genes {
		sum += gene * gene
	}
	individual.Fitness = sum
}

func geneticAlgorithm(popSize, dimensions int, generations int) *Individual {
	rand.Seed(time.Now().UnixNano())
	population := make([]Individual, popSize)

	// Initialize population
	for i := range population {
		genes := make([]float64, dimensions)
		for j := range genes {
			genes[j] = rand.NormFloat64()
		}
		population[i] = Individual{Genes: genes}
	}

	for gen := 0; gen < generations; gen++ {
		var wg sync.WaitGroup
		// Concurrent fitness evaluation
		for i := range population {
			wg.Add(1)
			go evaluate(&population[i], &wg)
		}
		wg.Wait()

		// Selection, crossover, mutation (simplified)
		newPopulation := make([]Individual, popSize)
		for i := 0; i < popSize; i += 2 {
			parent1 := tournamentSelection(population)
			parent2 := tournamentSelection(population)
			child1, child2 := crossover(parent1, parent2)
			newPopulation[i] = mutate(child1)
			newPopulation[i+1] = mutate(child2)
		}
		population = newPopulation
	}

	best := &population[0]
	for i := range population {
		if population[i].Fitness < best.Fitness {
			best = &population[i]
		}
	}
	return best
}

2. Ant Colony Optimization (ACO)

Theory

  • Mimics ant foraging behavior using pheromones
  • Key Components:
    1. Pheromone Matrix: Chemical trail intensity between nodes
    2. Path Construction: Probabilistic node selection
    3. Pheromone Update: Evaporation + reinforcement

Examples

  1. Network Routing: Optimize data packet paths in telecommunications
  2. Task Scheduling: Schedule jobs on parallel machines
  3. Traveling Salesman: Find shortest round-trip routes

Concurrent Go Implementation

package main

import (
	"math"
	"math/rand"
	"sync"
)

type Ant struct {
	Tour     []int
	Visited  []bool
	Distance float64
}

func acoParallel(graph [][]float64, ants int, iterations int) []int {
	pheromones := initializePheromones(len(graph))
	bestTour := make([]int, len(graph))
	bestLength := math.MaxFloat64
	var mutex sync.Mutex

	for iter := 0; iter < iterations; iter++ {
		antColony := make([]Ant, ants)
		var wg sync.WaitGroup

		// Concurrent ant exploration
		for i := 0; i < ants; i++ {
			wg.Add(1)
			go func(antID int) {
				defer wg.Done()
				ant := constructTour(graph, pheromones)
				
				mutex.Lock()
				if ant.Distance < bestLength {
					bestLength = ant.Distance
					copy(bestTour, ant.Tour)
				}
				mutex.Unlock()
			}(i)
		}
		wg.Wait()
		updatePheromones(&pheromones, antColony)
	}
	return bestTour
}

func constructTour(graph [][]float64, pheromones [][]float64) Ant {
	// Implementation of probabilistic path construction
	return Ant{} 
}

3. Particle Swarm Optimization (PSO)

Theory

  • Social behavior of bird flocks/fish schools
  • Key Equations:
    • Velocity: v = w*v + c1*rand()*(pbest - x) + c2*rand()*(gbest - x)
    • Position: x = x + v

Examples

  1. Drone Swarm Control: Coordinate UAV formations
  2. Hyperparameter Tuning: Optimize ML model parameters
  3. Antenna Design: Maximize radiation efficiency

Concurrent Go Implementation

package main

import (
	"math"
	"math/rand"
	"sync"
)

type Particle struct {
	Position []float64
	Velocity []float64
	PBest    []float64
	PBestFit float64
}

func psoConcurrent(dimensions, particles int, epochs int) []float64 {
	swarm := make([]Particle, particles)
	gbest := make([]float64, dimensions)
	gbestFit := math.MaxFloat64
	var mutex sync.Mutex

	// Initialize swarm
	for i := range swarm {
		// ... initialization code ...
	}

	for epoch := 0; epoch < epochs; epoch++ {
		var wg sync.WaitGroup
		
		// Concurrent particle updates
		for i := range swarm {
			wg.Add(1)
			go func(i int) {
				defer wg.Done()
				updateVelocity(&swarm[i])
				updatePosition(&swarm[i])
				
				currentFit := fitness(swarm[i].Position)
				if currentFit < swarm[i].PBestFit {
					swarm[i].PBest = append([]float64{}, swarm[i].Position...)
					swarm[i].PBestFit = currentFit
					
					mutex.Lock()
					if currentFit < gbestFit {
						gbestFit = currentFit
						gbest = append([]float64{}, swarm[i].Position...)
					}
					mutex.Unlock()
				}
			}(i)
		}
		wg.Wait()
	}
	return gbest
}

4. Bayesian Optimization (BO)

Theory

  1. Surrogate Model: Gaussian Process (GP) approximates objective
  2. Acquisition Function: Balances exploration/exploitation (e.g., EI, UCB)

Examples

  1. Hyperparameter Tuning: Optimize learning rates for DL models
  2. Material Science: Discover alloys with desired properties
  3. Robotics: Optimize controller parameters

Concurrent Go Implementation

package main

import (
	"gonum.org/v1/gonum/stat/distmv"
)

type GaussianProcess struct {
	// GP implementation
}

func bayesianOptimizationConcurrent() {
	// Parallel evaluation of candidate points
	var wg sync.WaitGroup
	candidates := generateCandidates()
	results := make(chan float64, len(candidates))

	for _, x := range candidates {
		wg.Add(1)
		go func(point []float64) {
			defer wg.Done()
			results <- evaluate(point)
		}(x)
	}

	wg.Wait()
	close(results)

	// Update surrogate model with new data
	// Select next point via acquisition function
}

5. Simulated Annealing (SA)

Theory

  1. Temperature Schedule: Controls acceptance probability
  2. Metropolis Criterion: P(accept) = exp(-ΔE/T)

Examples

  1. VLSI Chip Design: Optimize component placement
  2. Portfolio Optimization: Balance risk/return
  3. Protein Folding: Find minimum-energy configurations

Concurrent Go Implementation

package main

import (
	"math"
	"math/rand"
	"sync"
)

func simulatedAnnealingConcurrent(initialSolution []float64, maxIter int) []float64 {
	current := initialSolution
	best := append([]float64{}, current...)
	T := 1000.0
	cooling := 0.95
	var mutex sync.Mutex

	for iter := 0; iter < maxIter; iter++ {
		var wg sync.WaitGroup
		neighbors := generateNeighbors(current)
		results := make(chan []float64, len(neighbors))

		// Concurrent neighbor evaluation
		for _, neighbor := range neighbors {
			wg.Add(1)
			go func(n []float64) {
				defer wg.Done()
				if accept(n, current, T) {
					results <- n
				}
			}(neighbor)
		}

		wg.Wait()
		close(results)

		// Process results
		for newSol := range results {
			mutex.Lock()
			if energy(newSol) < energy(best) {
				best = newSol
			}
			current = newSol
			mutex.Unlock()
		}

		T *= cooling
	}
	return best
}

Key Implementation Notes

  1. Concurrency Patterns:

    • Goroutines for parallel fitness evaluations
    • sync.WaitGroup for task synchronization
    • sync.Mutex for shared resource protection
  2. Parameter Tuning:

    • GA: Mutation rate (0.01-0.1), crossover rate (0.7-0.9)
    • PSO: ω = 0.729, c1 = c2 = 1.494
    • SA: Cooling rate (0.95-0.99)
  3. Benchmark Functions:

    • Sphere: f(x) = Σx²
    • Rastrigin: f(x) = 10n + Σ(x² - 10cos(2πx))
    • Ackley: Multimodal function with many local minima
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment