Interactive Approach 2 Bias Application (Claude Prompt)

Chinchilla Approach 2 Extrapolation Error Demo

Create an application comparing Chinchilla Approach 2 token extrapolations against a known, analytical ground truth.

All details below reference Hoffmann et al. 2022, "Training Compute-Optimal Large Language Models" (https://arxiv.org/abs/2203.15556).

Implementation:

Define the Chinchilla loss surface: L(N, D) = E + A/N^α + B/D^β with parameters α=0.34, β=0.28, A=406.4, B=410.7, E=1.69 (Appendix D)
Use the compute constraint C = 6 * N * D (FLOPs)
Derive the analytical ground truth (Section 3): minimizing L subject to the compute constraint yields Dₒₚₜ(C) = (1/G) * (C/6)^b where b = α/(α+β) and G = (αA / (βB))^(1/(α+β))
Implement Approach 2 (Section 3):
- For each compute budget in [1e17, 1e18, 1e19, 1e20, 1e21], sample 16 points along the IsoFLOP curve spanning from Dₒₚₜ/8 to 8*Dₒₚₜ (a 64× ratio end-to-end, or ≈1.806 decades in log₁₀ space)
- Fit a parabola to L vs log₁₀(D) for each budget and extract the vertex as the inferred Dₒₚₜ
Fit a power law log₁₀(Dₒₚₜ) = b * log₁₀(C) + b₀ to the inferred optima via linear regression
Extrapolate to C = 1e24 FLOPs: use the fitted coefficients (b, b₀) to predict Dₒₚₜ at the new budget, and compare against the analytical ground truth at the same budget
Print the true Dₒₚₜ, predicted Dₒₚₜ, and relative error (%)

Inputs (configurable by user):

Outputs: