Prompt for artifact https://claude.ai/public/artifacts/ff4b6e45-cc20-4a96-b95c-57caac05bfff
Create an application comparing Chinchilla Approach 2 token extrapolations against a known, analytical ground truth.
All details below reference Hoffmann et al. 2022, "Training Compute-Optimal Large Language Models" (https://arxiv.org/abs/2203.15556).
Implementation:
- Define the Chinchilla loss surface: L(N, D) = E + A/N^α + B/D^β with parameters α=0.34, β=0.28, A=406.4, B=410.7, E=1.69 (Appendix D)
- Use the compute constraint C = 6 * N * D (FLOPs)
- Derive the analytical ground truth (Section 3): minimizing L subject to the compute constraint yields Dₒₚₜ(C) = (1/G) * (C/6)^b where b = α/(α+β) and G = (αA / (βB))^(1/(α+β))
- Implement Approach 2 (Section 3):
- For each compute budget in [1e17, 1e18, 1e19, 1e20, 1e21], sample 16 points along the IsoFLOP curve spanning from Dₒₚₜ/8 to 8*Dₒₚₜ (a 64× ratio end-to-end, or ≈1.806 decades in log₁₀ space)
- Fit a parabola to L vs log₁₀(D) for each budget and extract the vertex as the inferred Dₒₚₜ
- Fit a power law log₁₀(Dₒₚₜ) = b * log₁₀(C) + b₀ to the inferred optima via linear regression
- Extrapolate to C = 1e24 FLOPs: use the fitted coefficients (b, b₀) to predict Dₒₚₜ at the new budget, and compare against the analytical ground truth at the same budget
- Print the true Dₒₚₜ, predicted Dₒₚₜ, and relative error (%)
Inputs (configurable by user):
- Loss surface params (5 of them)
- Sampling grid width
- Number of sampled points per IsoFLOP curve
Outputs:
- IsoFLOP curves
- Best fit parabolas for IsoFLOP curves
- Inferred parabola minima
- Log-linear regression results
- Extrapolation results for both Approach 2 and the true value