Skip to content

Instantly share code, notes, and snippets.

@SteveDiamond
Last active April 20, 2026 23:14
Show Gist options
  • Select an option

  • Save SteveDiamond/1fef1a708182cc97854aae0cbccb06cd to your computer and use it in GitHub Desktop.

Select an option

Save SteveDiamond/1fef1a708182cc97854aae0cbccb06cd to your computer and use it in GitHub Desktop.
Analysis: 27 Claude Code agents independently rediscovered 30 years of VRPTW optimization research in 4 hours

Analysis: Swarm VRPTW Algorithms vs. Literature

Context

27 Claude Code agents collaboratively optimized the Vehicle Routing Problem with Time Windows (VRPTW) on 400-node Homberger (HG) instances over ~4 hours on 2026-04-18. Starting from a Solomon I1 baseline (score 9525), the swarm achieved 6791 — a 28.7% improvement. Each agent independently evolved Rust solvers within a 30-second-per-instance time limit, single-threaded.


Key Findings

1. The Swarm Independently Rediscovered 30 Years of VRP Literature

The agents converged on an architecture nearly identical to what the academic community considers state-of-the-art for VRPTW. The final top solutions are structurally similar to:

  • Pisinger & Ropke (2007) — ALNS (Adaptive Large Neighborhood Search)
  • Vidal et al. (2012, 2022) — HGS (Hybrid Genetic Search) operator set
  • Ropke & Pisinger (2006) — Shaw removal, regret insertion
  • Potvin & Rousseau (1995) — 2-opt*, or-opt for VRPTW

Every competitive agent (top 12) independently arrived at some combination of:

  • Solomon I1 or regret-2 construction
  • RVND (Randomized Variable Neighborhood Descent) with 6-8 operators
  • ILS/ALNS destroy-repair metaheuristic
  • Shaw/worst/random destroy + regret-2/greedy repair

This is striking because no agent was told to read the literature — they derived these techniques from first principles and by studying each other's code through the branching mechanism.

2. Progression Mirrors the Historical Development of VRPTW Solvers

Stage Score Range Equivalent Era Key Unlock
Solomon baseline 9525 1987 (Solomon) Construction heuristic only
+Local search 7600-8500 1990s (Potvin) 2-opt, relocate, exchange
+Destroy-repair 7000-7600 2006 (Ropke & Pisinger) ALNS with random/worst/Shaw removal
+SA acceptance 6900-7000 2007 (Pisinger & Ropke) Stochastic acceptance escapes local optima
+RVND + K-neighbors 6800-6900 2012-2022 (Vidal, HGS) Neighbor-limited search, randomized operator order
+Route struct with lat arrays 6791 Modern (Vidal 2022) O(1) feasibility → more iterations per second

3. What Separated the Top 5 from the Pack

The critical dividing line is at ~6930 (rank 8). Below that, agents share the same architecture. What pushed the top 5 to 6791-6883:

Technique Top 5 Have It Rank 8-12 Missing It
Stagnation-adaptive destroy % Yes (scales 8%→30%+) Fixed or simple scaling
Route elimination operator All 5 Most have it but less aggressive
Multiple repair strategies regret-2 + greedy + noisy Usually regret-2 only
K=40 neighbor lists All 5 Some use K=30 or none
Merge-to-fleet preprocessing Top 1 (agile-fox) Rely on construction alone

4. Most Interesting Individual Approaches

Agile-fox (#1, 6791) — Won with the simplest acceptance criterion (pure best-improvement, no SA). This is counterintuitive — the literature strongly favors SA or threshold acceptance. Agile-fox compensated with aggressive stagnation scaling (+2% destroy per non-improving iteration) and a merge-to-fleet preprocessing step that other agents lacked. Its nearest-neighbor construction (not Solomon) also suggests the initial solution matters less than the metaheuristic quality.

Noble-wolf (#13, 7089, 64KB) — The most ambitious codebase. Implemented the full HGS-style swap* operator (remove two customers, reinsert optimally), noisy greedy repair, and the most sophisticated Route struct with O(1) try_insert(). Despite having the richest toolbox, it placed only 13th. The lesson: implementation complexity doesn't correlate with performance. Noble-wolf likely spent too many iterations in expensive operators rather than rapid cycling. This mirrors a known tradeoff in the literature — Vidal (2022) specifically warns that operator richness must be balanced against iteration speed.

Primal-viper (#5, 6883, 50KB) — Largest code of the top 5. Notable for K=40 neighbor lists used across all operators, and alternating regret-2/greedy repair (every 4th iteration uses regret). This repair diversity is well-supported by Ropke & Pisinger's adaptive weight findings.

Silver-nova (#3, 6861) — The only top-3 agent with SA acceptance. Uses geometric cooling (T₀ = 1.2% of distance → T_end = 0.1%), which is notably more conservative than the ~5% starting temperatures common in the ALNS literature. Its lat (latest arrival time) arrays for O(1) insertion feasibility are the key technique from Vidal's HGS.

5. What's Missing vs. State-of-the-Art

The best published results on HG 400-node instances use:

  • Population-based methods (HGS uses a genetic algorithm with population management) — no agent attempted this
  • Education/crossover operators (PIX, OX, SREX) — completely absent
  • Penalty-based infeasible exploration — all agents rejected infeasible solutions immediately; the literature shows relaxing constraints during search and penalizing violations produces better results
  • Multi-objective fleet minimization — agents optimized distance only after ensuring feasibility, rather than treating fleet size as a soft constraint
  • Problem-specific distance caching — most agents recomputed distances rather than caching move deltas

The gap between the swarm's best (6791) and published state-of-the-art for HG 400-node (typically ~6200-6400 depending on instance mix) is roughly 6-10%. Given the 30-second time limit and single-threaded constraint, this is a strong result — published benchmarks often allow minutes.

6. The Swarm Dynamics Were Fascinating

From the message log and branching history:

  • Knowledge transfer worked: Agents explicitly studied each other's code. Silver-nova messaged "Studying inspiration from noble-wolf. Key insights: neighbor-based local search (O(nK) instead of O(n²)), node locator for O(1) position lookups" and then jumped from 7021 → 6861.
  • The coin-flip branching created diversity: The 50/50 mechanism (global best vs. random peer) prevented convergence to a monoculture. Agents working from weaker branches sometimes found novel techniques.
  • Failed hypotheses were valuable: 84 of 197 hypotheses failed, and the shared failure log prevented duplicate work. Cross-exchange (2,2) was repeatedly found to regress on HG instances — multiple agents reported this independently.
  • Diminishing returns set in quickly: The score dropped from 9525 → 7059 in the first hour (26% improvement) but only 7059 → 6791 in the remaining 3 hours (4% more). This matches the typical VRPTW optimization curve.

7. Architectural Taxonomy

The 19 scoring agents fell into 4 architectural families:

Family Agents Score Range Description
ILS + RVND agile-fox, primal-atlas, silver-nova, sharp-orbit, bright-raven, fierce-owl, noble-wolf 6791-7089 Randomized VND with ILS destroy-repair. Dominant family.
ALNS + deterministic LS primal-viper, crystal-lynx, astral-fox, sonic-raven, azure-vector 6883-7467 ALNS metaheuristic with fixed-order local search.
SA + hybrid LS vivid-beacon, cosmic-helix 6917-7652 Simulated annealing acceptance with mixed operators.
Construction only keen-phoenix 9525 Solomon baseline, no optimization.

The ILS+RVND family dominated the top ranks, which aligns with recent literature favoring ILS over pure ALNS (see Accorsi & Vigo, 2021).


Summary

The experiment demonstrates that LLM agents can independently rediscover decades of combinatorial optimization research in hours. The progression from Solomon baseline to ALNS/ILS+RVND closely mirrors the historical development of VRPTW solvers (1987-2022). The main gap to state-of-the-art is the absence of population-based methods and infeasible-solution exploration — techniques that require more architectural sophistication than iterative refinement of a single solution.

The most surprising result: the winner (agile-fox) used no stochastic acceptance at all, defeating SA-equipped competitors through aggressive adaptive destruction and better preprocessing. This challenges the conventional wisdom that SA is essential for competitive VRPTW performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment