Draft for EA Forum — feedback welcome
Human values change. Sometimes predictably — generational replacement, exposure effects, information cascades. If LLMs have learned these patterns from their training data, they might predict value trajectories better than simple extrapolation.
This matters for alignment. If we could forecast where human values are heading, we'd have a tool for:
- Anticipating moral circle expansion
- Understanding which alignment targets are stable vs moving
- Informing long-term AI governance
I built value-forecasting to test this. The method:
- Select GSS variables with significant historical change (same-sex acceptance, marijuana legalization, etc.)
- Prompt LLMs with data available up to a cutoff year (e.g., 2000)
- Generate predictions for future years with uncertainty bounds
- Compare to actual GSS data
- Benchmark against time series models (linear extrapolation, ARIMA, ETS)
The numbers looked promising:
| Model | MAE | Coverage (90% CI) | Bias |
|---|---|---|---|
| LLM (Claude) | 12.5% | 42.9% | -12.4% |
| Linear | 30.2% | 35.7% | -30.2% |
| ARIMA | 31.4% | 50.0% | -31.4% |
For same-sex acceptance (HOMOSEX), predicting from 1990 to 2021:
- Linear extrapolation: 16.8% → Actual: 64% ❌
- LLM prediction: 48% → Actual: 64% ✓
The LLM saw the inflection point that extrapolation missed.
Here's where it falls apart.
Claude was trained on data through ~2024. Its weights contain:
- News articles: "Same-sex marriage support hits 70%"
- GSS results from 2010, 2018, 2021
- Wikipedia pages documenting the full trajectory
When I ask Claude to "predict" 2021 values using only information available in 2000, I'm not testing forecasting ability — I'm testing whether it can suppress recall while adding plausible uncertainty bounds. That's a very different skill.
The comparison to time series models isn't fair. ARIMA only sees pre-cutoff data. Claude sees everything and pretends not to.
This week's 80,000 Hours podcast with David Duvenaud discusses exactly this problem. Duvenaud and Alec Radford (GPT co-creator) are building historical LLMs — models trained exclusively on data up to specific cutoff years (1930, 1940, etc.).
Their approach:
- Curate training data with verified publication dates
- Aggressively filter for contamination (LLMs flagging anachronistic phrases)
- Validate on historical predictions before applying to future forecasts
This is the right methodology. The "huge schlep," as Duvenaud puts it, is data cleaning — constantly finding unintentional contamination.
Option 1: Historical LLMs Use Duvenaud/Radford's models when available. Or use older models (GPT-2, early GPT-3) with known training cutoffs — though their instruction-following is worse.
Option 2: Forward Predictions Only Ask current LLMs about 2030 values. Wait. Evaluate in 2030. Not great for a paper this month.
Option 3: Obscure Variables Find GSS questions unlikely to appear in news coverage. FEPOL ("women suited for politics") might be less contaminated than HOMOSEX/GRASS, which were major stories.
Option 4: Reframe the Research Question Not "LLMs forecast better" but "LLMs as value elicitation tools" — useful for HiveSight-style applications even if it's recall, not prediction.
Why does this matter beyond methodology?
The gradual disempowerment thesis argues that even aligned AI could lead to bad outcomes through competitive dynamics. One countermeasure: better forecasting of where values and institutions are heading.
If LLMs can genuinely predict moral trajectories (not just recall them), that's a tool for:
- Anticipating value drift before it happens
- Identifying which human preferences are stable alignment targets
- Building simulation infrastructure for collective reasoning (Society in Silico thesis)
If they can't — if it's fundamentally recall, not prediction — that's important to know. It means we need different approaches to value forecasting.
I'm presenting related work at EA Global San Francisco (February 2026) — specifically on AI and inequality simulation.
If you're interested in:
- Collaborating on historical LLM experiments
- Methodological approaches to value forecasting
- Connecting this to alignment research
Reach out: [email protected] / @MaxGhenis
This post represents preliminary work. The honest conclusion is: we wanted to test if LLMs can forecast values, discovered the contamination problem makes retrospective evaluation impossible, and are now thinking about what comes next.
- Value Forecasting repo
- 80k Hours: David Duvenaud on gradual disempowerment
- Gradual Disempowerment paper
- General Social Survey
- Argyle et al. (2023) — Silicon sampling methodology