- Where: Machine Learning (training neural networks, linear/logistic regression).
- Why:
- Simplicity: Easy to implement for differentiable objectives.
- Scalability: Stochastic variants (SGD) handle large datasets.
- Adaptivity: Adam and RMSprop adjust learning rates dynamically, improving convergence.
- Example: Training deep learning models (e.g., ResNet, GPT).
- Where: Engineering design, combinatorial optimization (e.g., aerospace, logistics).
- Why:
- Global Search: Avoids local optima in non-convex/non-differentiable problems.
- Flexibility: Works with discrete/continuous variables.
- Example: Optimizing aircraft wing shapes or factory layouts.
- Where: Operations Research (supply chain, resource allocation).
- Why:
- Efficiency: Solves linear problems with thousands of variables.
- Robustness: Well-understood for real-world LP problems.
- Example: Maximizing profit in manufacturing resource allocation.
- Where: Control systems, robotics, antenna design.
- Why:
- Swarm Intelligence: Balances exploration and exploitation without gradients.
- Parallelism: Easy to parallelize for distributed systems.
- Example: Tuning PID controllers in industrial automation.
- Where: Convex optimization (finance, engineering).
- Why:
- Speed: Polynomial-time convergence for large convex problems.
- Accuracy: Handles ill-conditioned matrices better than Simplex.
- Example: Portfolio optimization (Markowitz model).
- Where: Hyperparameter tuning (ML/AI), drug discovery.
- Why:
- Sample Efficiency: Minimizes expensive evaluations (e.g., training large models).
- No Gradients: Ideal for black-box functions.
- Example: Optimizing learning rates/batch sizes for deep learning.
- Where: VLSI design, scheduling, materials science.
- Why:
- Escapes Local Optima: Probabilistic acceptance of worse solutions.
- Simplicity: Easy to implement for combinatorial problems.
- Example: Optimizing chip layouts to minimize heat dissipation.
- Where: Medium-scale ML models (e.g., logistic regression, SVMs).
- Why:
- Memory Efficiency: Approximates Hessian without storing large matrices.
- Speed: Faster convergence than vanilla GD for smooth functions.
- Example: Training medium-sized neural networks.
- Where: Engineering design, economics, environmental planning.
- Why:
- Pareto Optimality: Maintains diverse solutions on the trade-off front.
- Scalability: Handles multiple conflicting objectives.
- Example: Designing cars for cost, safety, and fuel efficiency.
- Where: Distributed systems, image processing, finance.
- Why:
- Decentralization: Splits problems into subproblems for parallel solving.
- Convergence: Works for non-smooth/non-convex problems.
- Example: Distributed training of federated learning models.
-
Problem-Specific Strengths:
- Gradient-based methods (GD, Adam) dominate ML due to automatic differentiation frameworks (PyTorch, TensorFlow).
- Metaheuristics (GA, PSO) excel in non-differentiable/combinatorial spaces.
- Interior-point methods solve large convex problems faster than Simplex.
-
Scalability:
- SGD and Adam handle big data via mini-batches.
- Bayesian Optimization minimizes costly evaluations (e.g., lab experiments).
-
Ease of Use:
- Libraries like
scipy.optimize
andOptuna
abstract complexity.
- Libraries like
-
Theoretical Guarantees:
- Convex methods (Interior-Point, ADMM) have convergence proofs.
- Quantum Optimization: QAOA for finance/chemistry (still experimental).
- AutoML: Algorithm selection via meta-learning (e.g., Google’s Vizier).
- Differentiable Optimization: Integrating solvers into neural networks (e.g., CVXPYLayer).
graph TD
A[Initialize Parameters] --> B[Compute Gradient]
B --> C[Update Parameters]
C --> D{Converged?}
D -->|No| B
D -->|Yes| E[Return Parameters]
graph TD
A[Initialize Population] --> B[Evaluate Fitness]
B --> C[Select Parents]
C --> D[Crossover]
D --> E[Mutation]
E --> F[Replace Population]
F --> G{Max Generations?}
G -->|No| B
G -->|Yes| H[Return Best]
graph TD
A[Initialize Tableau] --> B[Select Pivot Column]
B --> C[Select Pivot Row]
C --> D[Perform Pivot]
D --> E{Optimal?}
E -->|No| B
E -->|Yes| F[Return Solution]
graph TD
A[Initialize Particles] --> B[Update Velocities]
B --> C[Update Positions]
C --> D[Evaluate Fitness]
D --> E[Update Best]
E --> F{Stopping Condition?}
F -->|No| B
F -->|Yes| G[Return Best]
graph TD
A[Initialize m, v] --> B[Compute Gradient]
B --> C[Update m]
C --> D[Update v]
D --> E[Bias Correction]
E --> F[Update Parameters]
F --> G{Converged?}
G -->|No| B
G -->|Yes| H[Return Parameters]
graph TD
A[Initialize T, Solution] --> B[Perturb Solution]
B --> C[Calculate ΔE]
C --> D{Accept?}
D -->|Yes| E[Update Solution]
D -->|No| F[Reject]
E --> G[Cool T]
F --> G
G --> H{T > T_min?}
H -->|Yes| B
H -->|No| I[Return Best]
graph TD
A[Initialize Pheromones] --> B[Ants Build Paths]
B --> C[Update Pheromones]
C --> D{Stopping Condition?}
D -->|No| B
D -->|Yes| E[Return Best Path]
graph TD
A[Initialize Model] --> B[Select New Point]
B --> C[Evaluate Function]
C --> D[Update Model]
D --> E{Max Iterations?}
E -->|No| B
E -->|Yes| F[Return Best]
graph TD
A[Initialize Population] --> B[Non-Dominated Sort]
B --> C[Crowding Distance]
C --> D[Select Top]
D --> E[Crossover/Mutation]
E --> F{Generations < Max?}
F -->|Yes| B
F -->|No| G[Return Pareto Front]
graph TD
A[Initialize x, z, y] --> B[Update x]
B --> C[Update z]
C --> D[Update y]
D --> E{Converged?}
E -->|No| B
E -->|Yes| F[Return x, z]
graph TD
A[Initialize] --> B[Compute Residual]
B --> C[Solve Equation]
C --> D[Update Parameters]
D --> E{Error Reduced?}
E -->|Yes| F[Adjust λ]
E -->|No| G[Adjust λ]
F --> H{Converged?}
G --> H
H -->|No| B
H -->|Yes| I[Return Parameters]
graph TD
A[Initialize Simplex] --> B[Sort Points]
B --> C[Compute Centroid]
C --> D[Reflect]
D --> E{Improved?}
E -->|Yes| F[Expand]
E -->|No| G[Contract]
F --> H[Replace]
G --> H
H --> I{Converged?}
I -->|No| B
I -->|Yes| J[Return Best]
graph TD
A[Initialize Parameters] --> B[Prepare State]
B --> C[Measure Energy]
C --> D[Optimize Parameters]
D --> E{Converged?}
E -->|No| B
E -->|Yes| F[Return Parameters]
graph TD
A[Initialize Queue] --> B[Dequeue Node]
B --> C{Node Valid?}
C -->|Yes| D[Branch]
C -->|No| E[Prune]
D --> F[Enqueue Children]
F --> B
E --> B
B --> G{Queue Empty?}
G -->|Yes| H[Return Best]
G -->|No| B
graph TD
A[Initialize Population] --> B[For Each Vector]
B --> C[Select Parents]
C --> D[Generate Mutant]
D --> E[Crossover]
E --> F{Better?}
F -->|Yes| G[Replace]
F -->|No| H[Keep]
G --> I{All Done?}
H --> I
I -->|No| B
I -->|Yes| J{Converged?}
J -->|No| B
J -->|Yes| K[Return Best]
-
Gradient Descent
- Theory: Iteratively updates parameters in the direction of the negative gradient to minimize a function.
- Example: Training linear/logistic regression models.
- Pseudo-Code:
Initialize θ, learning rate α while not converged: Compute gradient ∇J(θ) θ = θ - α * ∇J(θ)
-
Conjugate Gradient
- Theory: Solves linear systems by generating conjugate (orthogonal) search directions to accelerate convergence.
- Example: Solving large sparse systems (e.g., finite element analysis).
- Pseudo-Code:
Initialize θ, residual r = b - Aθ, direction d = r while ||r|| > tolerance: α = (rᵀr) / (dᵀAd) θ = θ + α*d r_new = r - α*A*d β = (r_newᵀr_new) / (rᵀr) d = r_new + β*d r = r_new
-
Newton’s Method
- Theory: Uses second derivatives (Hessian) for faster convergence near minima.
- Example: Portfolio optimization in finance.
- Pseudo-Code:
Initialize θ while not converged: Compute gradient ∇J(θ), Hessian H θ = θ - H⁻¹∇J(θ)
-
Quasi-Newton (BFGS, L-BFGS)
- Theory: Approximates the Hessian using gradient updates to avoid direct computation.
- Example: Training support vector machines (SVMs).
- Pseudo-Code (BFGS):
Initialize θ, B = I (identity matrix) while not converged: Compute direction d = -B∇J(θ) Perform line search for step size α s = α*d y = ∇J(θ + s) - ∇J(θ) B = B + (y yᵀ)/(yᵀs) - (B s sᵀ B)/(sᵀ B s) θ = θ + s
-
Levenberg-Marquardt
- Theory: Hybrid of gradient descent and Gauss-Newton for nonlinear least squares.
- Example: Calibrating camera parameters in computer vision.
- Pseudo-Code:
Initialize θ, λ = 0.001 while not converged: Compute residual r, Jacobian J Δθ = (JᵀJ + λI)⁻¹ Jᵀr if J(θ + Δθ) < J(θ): θ = θ + Δθ λ = λ/10 else: λ = λ*10
-
Nelder-Mead (Simplex)
- Theory: Adjusts a simplex through reflection, expansion, and contraction.
- Example: Tuning hyperparameters for industrial simulations.
- Pseudo-Code:
Initialize simplex with n+1 points while not converged: Sort points by f(x) Compute centroid (excluding worst) Reflect worst point over centroid if f(reflected) < best: Expand elif f(reflected) < worst: Replace worst else: Contract or shrink simplex
-
Pattern Search
- Theory: Explores points in a predefined geometric pattern (e.g., grid).
- Example: Optimizing aircraft wing shapes.
- Pseudo-Code:
Initialize θ, step size Δ while Δ > tolerance: Evaluate f(θ ± Δ in all directions) if improvement found: Move θ to better point Δ = 1.5*Δ # Expand step else: Δ = 0.5*Δ # Shrink step
-
Simplex Method
- Theory: Moves along vertices of the feasible region to find the optimal solution.
- Example: Maximizing profit in manufacturing resource allocation.
- Pseudo-Code:
Convert to standard form (Ax = b, x ≥ 0) while pivot exists: Select entering variable (most negative coefficient) Select leaving variable (min ratio test) Pivot to update tableau
-
Interior-Point Methods
- Theory: Traverses the interior of the feasible region using barrier functions.
- Example: Solving large-scale logistics optimization.
- Pseudo-Code:
Initialize x > 0, μ > 0, tolerance while duality gap > tolerance: Solve Newton step for barrier problem Update x, μ = σμ (σ ∈ (0,1))
-
Trust Region Methods
- Theory: Approximates the objective locally with a model (e.g., quadratic) within a trusted region.
- Example: Fitting nonlinear curves to sensor data.
- Pseudo-Code:
Initialize θ, trust region radius Δ while not converged: Solve subproblem: min model m_k(θ + s) within ||s|| ≤ Δ Compute actual/predicted reduction ratio ρ if ρ > 0.75: Δ = 2Δ # Expand region elif ρ < 0.25: Δ = 0.5Δ # Shrink region Update θ
-
Sequential Quadratic Programming (SQP)
- Theory: Solves constrained problems via a sequence of quadratic subproblems.
- Example: Trajectory optimization for rockets.
- Pseudo-Code:
Initialize θ, Lagrange multipliers λ while not converged: Compute Hessian H, gradient ∇f, Jacobian J Solve QP: min 0.5sᵀHs + ∇fᵀs s.t. Js + c = 0 Update θ = θ + s, λ = new multipliers
-
Genetic Algorithm (GA)
- Theory: Evolves solutions using selection, crossover, and mutation.
- Example: Optimizing wind farm layouts.
- Pseudo-Code:
Initialize population while generations < max_gen: Evaluate fitness Select parents via tournament/roulette Crossover parents to create offspring Mutate offspring Replace worst individuals
-
Differential Evolution (DE)
- Theory: Generates mutants by adding scaled differences between population vectors.
- Example: Optimizing drug discovery pipelines.
- Pseudo-Code:
Initialize population for each individual x_i: Select x_a, x_b, x_c ≠ x_i mutant = x_a + F*(x_b - x_c) trial = crossover(x_i, mutant) if f(trial) < f(x_i): replace x_i
-
Evolution Strategies (ES)
- Theory: Adapts mutation strength based on success history.
- Example: Training robotic control policies.
- Pseudo-Code:
Initialize θ, σ (mutation strength) while not converged: Generate offspring by θ + σ*N(0, I) Evaluate fitness Update θ and σ based on top offspring
-
Particle Swarm Optimization (PSO)
- Theory: Particles adjust trajectories based on personal and global best positions.
- Example: Optimizing antenna array designs.
- Pseudo-Code:
Initialize particles with positions and velocities while not converged: for each particle: v = w*v + c1*rand()*(pbest - x) + c2*rand()*(gbest - x) x = x + v Update pbest and gbest
-
Ant Colony Optimization (ACO)
- Theory: Ants probabilistically build paths using pheromone trails.
- Example: Optimizing traffic light cycles.
- Pseudo-Code:
Initialize pheromone trails τ while not converged: for each ant: Build path via τ probabilities Update τ (evaporate + deposit based on path quality)
-
Artificial Bee Colony (ABC)
- Theory: Simulates bees (employed, onlookers, scouts) foraging for optimal solutions.
- Example: Task scheduling in cloud computing.
- Pseudo-Code:
Initialize food sources (solutions) while cycles < max_cycles: Employed bees: Explore nearby solutions Onlooker bees: Select solutions probabilistically Scout bees: Replace abandoned solutions
-
Simulated Annealing (SA)
- Theory: Accepts worse solutions early to escape local optima, mimicking metal cooling.
- Example: VLSI chip placement.
- Pseudo-Code:
Initialize T (temperature), current_solution while T > T_min: new_solution = perturb(current_solution) ΔE = f(new) - f(current) if ΔE < 0 or rand() < exp(-ΔE/T): current_solution = new_solution T = cool(T)
-
Harmony Search
- Theory: Mimics musicians improvising to find harmonies (solutions).
- Example: Optimizing water distribution networks.
- Pseudo-Code:
Initialize harmony memory (HM) while not converged: Improvise new harmony via HM or randomness if new harmony better than worst in HM: Replace worst in HM
-
Gravitational Search
- Theory: Masses (solutions) attract each other based on gravitational force.
- Example: Image segmentation.
- Pseudo-Code:
Initialize masses with positions while not converged: Calculate gravitational forces between masses Update velocities and positions Evaluate fitness and update masses
-
Memetic Algorithms
- Theory: Combines GA with local search for refinement.
- Example: Optimizing cancer radiotherapy plans.
- Pseudo-Code:
Initialize population while generations < max_gen: Perform GA operations (crossover/mutation) Apply local search to top individuals
-
Hyper-Heuristics
- Theory: Selects or generates heuristics dynamically.
- Example: Automated algorithm selection for SAT solvers.
- Pseudo-Code:
Initialize set of low-level heuristics while not converged: Select heuristic based on performance history Apply heuristic to current solution
-
Stochastic Gradient Descent (SGD)
- Theory: Uses random mini-batches to approximate gradients.
- Example: Training deep neural networks.
- Pseudo-Code:
Initialize θ, α for epoch in 1 to max_epochs: Shuffle data for each mini-batch: θ = θ - α*∇J(θ, mini-batch)
-
Randomized Algorithms
- Theory: Uses randomness to solve deterministic problems efficiently.
- Example: Randomized SVD for large matrices.
- Pseudo-Code:
Generate random matrix Ω Compute sketch Y = AΩ Orthonormalize Y to get Q Compute B = QᵀA Perform SVD on B
-
Stochastic Approximation
- Theory: Iteratively estimates solutions with noisy observations.
- Example: Reinforcement learning (Q-learning).
- Pseudo-Code:
Initialize θ for t = 1, 2, ...: Observe noisy gradient g_t θ = θ - α_t * g_t
-
Interior-Point Methods
- Theory: Uses barrier functions to handle constraints.
- Example: Portfolio optimization with risk constraints.
- Pseudo-Code:
Initialize x, μ > 0 while μ > tolerance: Solve min f(x) - μ∑log(c_i(x)) μ = σμ (σ ∈ (0,1))
-
Subgradient Methods
- Theory: Generalizes gradient descent for non-differentiable functions.
- Example: Training SVMs with hinge loss.
- Pseudo-Code:
Initialize θ, α while not converged: Compute subgradient g θ = θ - α*g
-
Proximal Algorithms (ADMM)
- Theory: Splits problems into subproblems with proximal operators.
- Example: Distributed matrix factorization.
- Pseudo-Code:
Initialize x, z, y while not converged: x = argmin_x [f(x) + (ρ/2)||x - z + y||²] z = argmin_z [g(z) + (ρ/2)||x - z + y||²] y = y + (x - z)
-
Penalty Methods
- Theory: Adds penalty terms to the objective for constraint violations.
- Example: Designing structures with material limits.
- Pseudo-Code:
Initialize θ, μ = 1 while not converged: Solve min f(θ) + μ*penalty(c(θ)) μ = 10*μ
-
Barrier Methods
- Theory: Uses logarithmic barriers to enforce constraints.
- Example: Optimal control in robotics.
- Pseudo-Code:
Initialize θ, μ = 1 while μ > tolerance: Solve min f(θ) - μ∑log(-c_i(θ)) μ = 0.1*μ
-
Augmented Lagrangian
- Theory: Combines Lagrangian multipliers with penalty terms.
- Example: Economic dispatch in power systems.
- Pseudo-Code:
Initialize λ, μ while not converged: Solve min f(x) + λᵀc(x) + (μ/2)||c(x)||² λ = λ + μ*c(x) μ = increase_if_needed
-
Branch and Bound
- Theory: Explores solution space via tree traversal with pruning.
- Example: Solving the knapsack problem.
- Pseudo-Code:
Initialize queue with root node, best = ∞ while queue not empty: node = dequeue() if node.cost >= best: prune if node is feasible: update best else: branch into child nodes
-
Dynamic Programming
- Theory: Reuses solutions to overlapping subproblems.
- Example: RNA secondary structure prediction.
- Pseudo-Code:
Initialize memo table for i in 1 to n: for j in 1 to m: dp[i][j] = min(dp[i-1][j], dp[i][j-1], ...)
-
Greedy Algorithms
- Theory: Makes locally optimal choices at each step.
- Example: Huffman coding for data compression.
- Pseudo-Code:
Sort items by some criterion while solution not complete: Select next best item if feasible: add to solution
-
Tabu Search
- Theory: Avoids revisiting solutions using a tabu list.
- Example: Vehicle routing problems.
- Pseudo-Code:
Initialize current_solution, tabu_list while not converged: Generate neighbors not in tabu_list Select best neighbor Update current_solution and tabu_list
-
NSGA-II
- Theory: Uses non-dominated sorting and crowding distance for Pareto fronts.
- Example: Designing eco-friendly buildings.
- Pseudo-Code:
Initialize population while generations < max_gen: Create offspring via crossover/mutation Combine parents + offspring Rank by non-domination and crowding Select top N individuals
-
MOEA/D
- Theory: Decomposes multi-objective problems into scalar subproblems.
- Example: Optimizing turbine efficiency and cost.
- Pseudo-Code:
Initialize weight vectors for each subproblem: Solve using neighboring subproblems Update solutions and neighborhoods
-
Pareto Front Methods
- Theory: Directly approximates the Pareto optimal set.
- Example: Aerospace design trade-offs.
- Pseudo-Code:
Initialize population while not converged: Update solutions using dominance criteria Maintain archive of non-dominated solutions
-
DIRECT
- Theory: Divides search space into hyper-rectangles for global optimization.
- Example: Tuning hyperparameters for climate models.
- Pseudo-Code:
Initialize hyper-rectangles while budget not exhausted: Identify potentially optimal rectangles Split and sample new points
-
Bayesian Optimization
- Theory: Uses probabilistic models (e.g., Gaussian processes) to guide search.
- Example: Optimizing machine learning hyperparameters.
- Pseudo-Code:
Initialize surrogate model for i in 1 to max_iter: Select x* maximizing acquisition function Evaluate f(x*) Update surrogate model
-
Response Surface Methods
- Theory: Fits a surrogate model (e.g., polynomial) to approximate the objective.
- Example: Optimizing chemical reaction conditions.
- Pseudo-Code:
Design initial experiments (e.g., Latin hypercube) while not converged: Fit surrogate model to data Propose new points via model predictions
-
Adam
- Theory: Combines momentum and adaptive learning rates.
- Example: Training transformers for NLP.
- Pseudo-Code:
Initialize m=0, v=0, β1=0.9, β2=0.999 for t in 1 to max_iter: m = β1*m + (1-β1)*g v = β2*v + (1-β2)*g² m_hat = m / (1-β1^t) v_hat = v / (1-β2^t) θ = θ - α*m_hat / (sqrt(v_hat) + ε)
-
RMSprop
- Theory: Adapts learning rates using moving average of squared gradients.
- Example: Training recurrent neural networks (RNNs).
- Pseudo-Code:
Initialize cache=0, decay_rate=0.9 while not converged: cache = decay_rate*cache + (1-decay_rate)*g² θ = θ - α*g / (sqrt(cache) + ε)
-
Adagrad
- Theory: Adapts learning rates per-parameter based on historical gradients.
- Example: Sparse data problems (e.g., click-through rate prediction).
- Pseudo-Code:
Initialize G=0 while not converged: G = G + g² θ = θ - α*g / (sqrt(G) + ε)
-
AdaDelta
- Theory: Improves Adagrad by limiting historical gradient accumulation.
- Example: Training deep autoencoders.
- Pseudo-Code:
Initialize E[g²]=0, E[Δθ²]=0 while not converged: E[g²] = ρ*E[g²] + (1-ρ)*g² Δθ = - (sqrt(E[Δθ²] + ε) / sqrt(E[g²] + ε)) * g θ = θ + Δθ E[Δθ²] = ρ*E[Δθ²] + (1-ρ)*Δθ²
-
Nesterov Accelerated Gradient
- Theory: "Looks ahead" to adjust momentum for faster convergence.
- Example: Training large-scale convolutional networks.
- Pseudo-Code:
Initialize v=0, α, μ while not converged: v_prev = v v = μ*v - α*∇J(θ + μ*v) θ = θ + v
-
Quantum Annealing
- Theory: Exploits quantum tunneling to find global minima of Ising models.
- Example: Solving protein folding problems.
- Pseudo-Code (Conceptual):
Encode problem into Hamiltonian H Prepare initial state |ψ₀> Adiabatically evolve H from H₀ to H Measure final state for solution
-
QAOA
- Theory: Approximates solutions via parameterized quantum circuits.
- Example: Portfolio optimization.
- Pseudo-Code:
Initialize parameters γ, β for iterations: Prepare |ψ(γ, β)> via quantum circuit Measure ⟨H⟩ Classically optimize γ, β to minimize ⟨H⟩
-
Quadratic Programming
- Theory: Optimizes quadratic objectives with linear constraints.
- Example: Portfolio optimization (Markowitz model).
- Pseudo-Code:
Convert to standard form: min 0.5xᵀQx + cᵀx s.t. Ax ≤ b Solve via active-set or interior-point methods
-
Semidefinite Programming
- Theory: Optimizes over symmetric positive semidefinite matrices.
- Example: Sensor network localization.
- Pseudo-Code:
Convert to standard form: min C • X s.t. A_i • X = b_i, X ⪰ 0 Solve via interior-point methods
-
Interval Methods
- Theory: Uses interval arithmetic to bound solutions rigorously.
- Example: Verified global optimization in robotics.
- Pseudo-Code:
Initialize interval X while width(X) > tolerance: Subdivide X into subintervals Prune subintervals violating constraints
Below is a comprehensive breakdown of the five optimization algorithms, including theoretical foundations, real-world examples, and concurrent Go implementations. I'll focus on clarity, correctness, and parallelism.
- Inspired by natural selection and genetics
- Key Steps:
- Population: Initialize candidate solutions
- Fitness: Evaluate solution quality
- Selection: Choose parents (e.g., tournament selection)
- Crossover: Combine parent genes
- Mutation: Randomly alter genes
- Replacement: Create new generation
- Logistics Routing: Optimize delivery routes for fuel efficiency
- Neural Architecture Search: Evolve optimal neural network structures
- Drug Discovery: Find molecular configurations with desired properties
package main
import (
"math/rand"
"sync"
"time"
)
type Individual struct {
Genes []float64
Fitness float64
}
func evaluate(individual *Individual, wg *sync.WaitGroup) {
defer wg.Done()
// Example: Minimize sum of squares
sum := 0.0
for _, gene := range individual.Genes {
sum += gene * gene
}
individual.Fitness = sum
}
func geneticAlgorithm(popSize, dimensions int, generations int) *Individual {
rand.Seed(time.Now().UnixNano())
population := make([]Individual, popSize)
// Initialize population
for i := range population {
genes := make([]float64, dimensions)
for j := range genes {
genes[j] = rand.NormFloat64()
}
population[i] = Individual{Genes: genes}
}
for gen := 0; gen < generations; gen++ {
var wg sync.WaitGroup
// Concurrent fitness evaluation
for i := range population {
wg.Add(1)
go evaluate(&population[i], &wg)
}
wg.Wait()
// Selection, crossover, mutation (simplified)
newPopulation := make([]Individual, popSize)
for i := 0; i < popSize; i += 2 {
parent1 := tournamentSelection(population)
parent2 := tournamentSelection(population)
child1, child2 := crossover(parent1, parent2)
newPopulation[i] = mutate(child1)
newPopulation[i+1] = mutate(child2)
}
population = newPopulation
}
best := &population[0]
for i := range population {
if population[i].Fitness < best.Fitness {
best = &population[i]
}
}
return best
}
- Mimics ant foraging behavior using pheromones
- Key Components:
- Pheromone Matrix: Chemical trail intensity between nodes
- Path Construction: Probabilistic node selection
- Pheromone Update: Evaporation + reinforcement
- Network Routing: Optimize data packet paths in telecommunications
- Task Scheduling: Schedule jobs on parallel machines
- Traveling Salesman: Find shortest round-trip routes
package main
import (
"math"
"math/rand"
"sync"
)
type Ant struct {
Tour []int
Visited []bool
Distance float64
}
func acoParallel(graph [][]float64, ants int, iterations int) []int {
pheromones := initializePheromones(len(graph))
bestTour := make([]int, len(graph))
bestLength := math.MaxFloat64
var mutex sync.Mutex
for iter := 0; iter < iterations; iter++ {
antColony := make([]Ant, ants)
var wg sync.WaitGroup
// Concurrent ant exploration
for i := 0; i < ants; i++ {
wg.Add(1)
go func(antID int) {
defer wg.Done()
ant := constructTour(graph, pheromones)
mutex.Lock()
if ant.Distance < bestLength {
bestLength = ant.Distance
copy(bestTour, ant.Tour)
}
mutex.Unlock()
}(i)
}
wg.Wait()
updatePheromones(&pheromones, antColony)
}
return bestTour
}
func constructTour(graph [][]float64, pheromones [][]float64) Ant {
// Implementation of probabilistic path construction
return Ant{}
}
- Social behavior of bird flocks/fish schools
- Key Equations:
- Velocity:
v = w*v + c1*rand()*(pbest - x) + c2*rand()*(gbest - x)
- Position:
x = x + v
- Velocity:
- Drone Swarm Control: Coordinate UAV formations
- Hyperparameter Tuning: Optimize ML model parameters
- Antenna Design: Maximize radiation efficiency
package main
import (
"math"
"math/rand"
"sync"
)
type Particle struct {
Position []float64
Velocity []float64
PBest []float64
PBestFit float64
}
func psoConcurrent(dimensions, particles int, epochs int) []float64 {
swarm := make([]Particle, particles)
gbest := make([]float64, dimensions)
gbestFit := math.MaxFloat64
var mutex sync.Mutex
// Initialize swarm
for i := range swarm {
// ... initialization code ...
}
for epoch := 0; epoch < epochs; epoch++ {
var wg sync.WaitGroup
// Concurrent particle updates
for i := range swarm {
wg.Add(1)
go func(i int) {
defer wg.Done()
updateVelocity(&swarm[i])
updatePosition(&swarm[i])
currentFit := fitness(swarm[i].Position)
if currentFit < swarm[i].PBestFit {
swarm[i].PBest = append([]float64{}, swarm[i].Position...)
swarm[i].PBestFit = currentFit
mutex.Lock()
if currentFit < gbestFit {
gbestFit = currentFit
gbest = append([]float64{}, swarm[i].Position...)
}
mutex.Unlock()
}
}(i)
}
wg.Wait()
}
return gbest
}
- Surrogate Model: Gaussian Process (GP) approximates objective
- Acquisition Function: Balances exploration/exploitation (e.g., EI, UCB)
- Hyperparameter Tuning: Optimize learning rates for DL models
- Material Science: Discover alloys with desired properties
- Robotics: Optimize controller parameters
package main
import (
"gonum.org/v1/gonum/stat/distmv"
)
type GaussianProcess struct {
// GP implementation
}
func bayesianOptimizationConcurrent() {
// Parallel evaluation of candidate points
var wg sync.WaitGroup
candidates := generateCandidates()
results := make(chan float64, len(candidates))
for _, x := range candidates {
wg.Add(1)
go func(point []float64) {
defer wg.Done()
results <- evaluate(point)
}(x)
}
wg.Wait()
close(results)
// Update surrogate model with new data
// Select next point via acquisition function
}
- Temperature Schedule: Controls acceptance probability
- Metropolis Criterion:
P(accept) = exp(-ΔE/T)
- VLSI Chip Design: Optimize component placement
- Portfolio Optimization: Balance risk/return
- Protein Folding: Find minimum-energy configurations
package main
import (
"math"
"math/rand"
"sync"
)
func simulatedAnnealingConcurrent(initialSolution []float64, maxIter int) []float64 {
current := initialSolution
best := append([]float64{}, current...)
T := 1000.0
cooling := 0.95
var mutex sync.Mutex
for iter := 0; iter < maxIter; iter++ {
var wg sync.WaitGroup
neighbors := generateNeighbors(current)
results := make(chan []float64, len(neighbors))
// Concurrent neighbor evaluation
for _, neighbor := range neighbors {
wg.Add(1)
go func(n []float64) {
defer wg.Done()
if accept(n, current, T) {
results <- n
}
}(neighbor)
}
wg.Wait()
close(results)
// Process results
for newSol := range results {
mutex.Lock()
if energy(newSol) < energy(best) {
best = newSol
}
current = newSol
mutex.Unlock()
}
T *= cooling
}
return best
}
-
Concurrency Patterns:
- Goroutines for parallel fitness evaluations
sync.WaitGroup
for task synchronizationsync.Mutex
for shared resource protection
-
Parameter Tuning:
- GA: Mutation rate (0.01-0.1), crossover rate (0.7-0.9)
- PSO: ω = 0.729, c1 = c2 = 1.494
- SA: Cooling rate (0.95-0.99)
-
Benchmark Functions:
- Sphere:
f(x) = Σx²
- Rastrigin:
f(x) = 10n + Σ(x² - 10cos(2πx))
- Ackley: Multimodal function with many local minima
- Sphere: