Skip to content

Instantly share code, notes, and snippets.

@MaxGhenis
Created February 2, 2026 04:27
Show Gist options
  • Select an option

  • Save MaxGhenis/734e28a469d81829df166549f68547ec to your computer and use it in GitHub Desktop.

Select an option

Save MaxGhenis/734e28a469d81829df166549f68547ec to your computer and use it in GitHub Desktop.
Draft: Can LLMs Forecast Human Value Evolution?

Can LLMs Forecast Human Value Evolution? (And Why We Can't Tell Yet)

Draft for EA Forum — feedback welcome

The Hypothesis

Human values change. Sometimes predictably — generational replacement, exposure effects, information cascades. If LLMs have learned these patterns from their training data, they might predict value trajectories better than simple extrapolation.

This matters for alignment. If we could forecast where human values are heading, we'd have a tool for:

  • Anticipating moral circle expansion
  • Understanding which alignment targets are stable vs moving
  • Informing long-term AI governance

The Experiment

I built value-forecasting to test this. The method:

  1. Select GSS variables with significant historical change (same-sex acceptance, marijuana legalization, etc.)
  2. Prompt LLMs with data available up to a cutoff year (e.g., 2000)
  3. Generate predictions for future years with uncertainty bounds
  4. Compare to actual GSS data
  5. Benchmark against time series models (linear extrapolation, ARIMA, ETS)

Preliminary Results

The numbers looked promising:

Model MAE Coverage (90% CI) Bias
LLM (Claude) 12.5% 42.9% -12.4%
Linear 30.2% 35.7% -30.2%
ARIMA 31.4% 50.0% -31.4%

For same-sex acceptance (HOMOSEX), predicting from 1990 to 2021:

  • Linear extrapolation: 16.8% → Actual: 64% ❌
  • LLM prediction: 48% → Actual: 64% ✓

The LLM saw the inflection point that extrapolation missed.

The Problem: Data Contamination

Here's where it falls apart.

Claude was trained on data through ~2024. Its weights contain:

  • News articles: "Same-sex marriage support hits 70%"
  • GSS results from 2010, 2018, 2021
  • Wikipedia pages documenting the full trajectory

When I ask Claude to "predict" 2021 values using only information available in 2000, I'm not testing forecasting ability — I'm testing whether it can suppress recall while adding plausible uncertainty bounds. That's a very different skill.

The comparison to time series models isn't fair. ARIMA only sees pre-cutoff data. Claude sees everything and pretends not to.

The Real Methodological Challenge

This week's 80,000 Hours podcast with David Duvenaud discusses exactly this problem. Duvenaud and Alec Radford (GPT co-creator) are building historical LLMs — models trained exclusively on data up to specific cutoff years (1930, 1940, etc.).

Their approach:

  1. Curate training data with verified publication dates
  2. Aggressively filter for contamination (LLMs flagging anachronistic phrases)
  3. Validate on historical predictions before applying to future forecasts

This is the right methodology. The "huge schlep," as Duvenaud puts it, is data cleaning — constantly finding unintentional contamination.

What Would Work

Option 1: Historical LLMs Use Duvenaud/Radford's models when available. Or use older models (GPT-2, early GPT-3) with known training cutoffs — though their instruction-following is worse.

Option 2: Forward Predictions Only Ask current LLMs about 2030 values. Wait. Evaluate in 2030. Not great for a paper this month.

Option 3: Obscure Variables Find GSS questions unlikely to appear in news coverage. FEPOL ("women suited for politics") might be less contaminated than HOMOSEX/GRASS, which were major stories.

Option 4: Reframe the Research Question Not "LLMs forecast better" but "LLMs as value elicitation tools" — useful for HiveSight-style applications even if it's recall, not prediction.

Connection to AI Alignment

Why does this matter beyond methodology?

The gradual disempowerment thesis argues that even aligned AI could lead to bad outcomes through competitive dynamics. One countermeasure: better forecasting of where values and institutions are heading.

If LLMs can genuinely predict moral trajectories (not just recall them), that's a tool for:

  • Anticipating value drift before it happens
  • Identifying which human preferences are stable alignment targets
  • Building simulation infrastructure for collective reasoning (Society in Silico thesis)

If they can't — if it's fundamentally recall, not prediction — that's important to know. It means we need different approaches to value forecasting.

Looking for Collaborators

I'm presenting related work at EA Global San Francisco (February 2026) — specifically on AI and inequality simulation.

If you're interested in:

  • Collaborating on historical LLM experiments
  • Methodological approaches to value forecasting
  • Connecting this to alignment research

Reach out: [email protected] / @MaxGhenis


This post represents preliminary work. The honest conclusion is: we wanted to test if LLMs can forecast values, discovered the contamination problem makes retrospective evaluation impossible, and are now thinking about what comes next.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment