Skip to content

Instantly share code, notes, and snippets.

@eric-czech
Created March 20, 2026 11:25
Show Gist options
  • Select an option

  • Save eric-czech/77cc21e825e19b7ac98b9a538da6ec99 to your computer and use it in GitHub Desktop.

Select an option

Save eric-czech/77cc21e825e19b7ac98b9a538da6ec99 to your computer and use it in GitHub Desktop.
Interactive Approach 2 Bias Application (Claude Prompt)

Prompt for artifact https://claude.ai/public/artifacts/ff4b6e45-cc20-4a96-b95c-57caac05bfff

Chinchilla Approach 2 Extrapolation Error Demo

Create an application comparing Chinchilla Approach 2 token extrapolations against a known, analytical ground truth.

All details below reference Hoffmann et al. 2022, "Training Compute-Optimal Large Language Models" (https://arxiv.org/abs/2203.15556).

Implementation:

  • Define the Chinchilla loss surface: L(N, D) = E + A/N^α + B/D^β with parameters α=0.34, β=0.28, A=406.4, B=410.7, E=1.69 (Appendix D)
  • Use the compute constraint C = 6 * N * D (FLOPs)
  • Derive the analytical ground truth (Section 3): minimizing L subject to the compute constraint yields Dₒₚₜ(C) = (1/G) * (C/6)^b where b = α/(α+β) and G = (αA / (βB))^(1/(α+β))
  • Implement Approach 2 (Section 3):
    • For each compute budget in [1e17, 1e18, 1e19, 1e20, 1e21], sample 16 points along the IsoFLOP curve spanning from Dₒₚₜ/8 to 8*Dₒₚₜ (a 64× ratio end-to-end, or ≈1.806 decades in log₁₀ space)
    • Fit a parabola to L vs log₁₀(D) for each budget and extract the vertex as the inferred Dₒₚₜ
  • Fit a power law log₁₀(Dₒₚₜ) = b * log₁₀(C) + b₀ to the inferred optima via linear regression
  • Extrapolate to C = 1e24 FLOPs: use the fitted coefficients (b, b₀) to predict Dₒₚₜ at the new budget, and compare against the analytical ground truth at the same budget
  • Print the true Dₒₚₜ, predicted Dₒₚₜ, and relative error (%)

Inputs (configurable by user):

  • Loss surface params (5 of them)
  • Sampling grid width
  • Number of sampled points per IsoFLOP curve

Outputs:

  • IsoFLOP curves
  • Best fit parabolas for IsoFLOP curves
  • Inferred parabola minima
  • Log-linear regression results
  • Extrapolation results for both Approach 2 and the true value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment