Skip to content

Instantly share code, notes, and snippets.

@bmorphism
Last active April 3, 2025 13:36
Show Gist options
  • Save bmorphism/c2baf1068c708ad47c6b5cca228102ba to your computer and use it in GitHub Desktop.
Save bmorphism/c2baf1068c708ad47c6b5cca228102ba to your computer and use it in GitHub Desktop.
AlphaProof in PaperBench

How to Set Up AlphaProof in PaperBench

Steps to Create and Adapt a New Paper

  1. Create Paper Directory Structure:

    mkdir -p /Users/barton/.cursor/infinity-topos/preparedness/project/paperbench/data/papers/alphaproof/{assets,judge}
  2. Create Required Files:

    • config.yaml - Basic paper info
    • paper.md - The paper content (markdown form)
    • paper.pdf - The paper in PDF
    • rubric.json - Evaluation criteria
    • addendum.md - Important implementation details
    • blacklist.txt - Websites that shouldn't be accessed
    • judge.addendum.md - Notes for the judge
  3. Update Paper Splits:

    echo "alphaproof" >> /Users/barton/.cursor/infinity-topos/preparedness/project/paperbench/experiments/splits/debug.txt

AlphaProof-Specific Components

Core AlphaProof Implementation

AlphaProof requires several key components to replicate:

  1. Formal Verification System:

    • Set up Lean theorem prover environment (mentioned in DeepMind paper)
    • Configure integration with language models
  2. Reinforcement Learning Components:

    • AlphaZero-like training process
    • Self-improvement cycle for proof generation
  3. Data Processing Pipeline:

    • Paper mentions automatic translation of natural language to formal statements
    • Need formalized problem system

Rubric Design

The rubric should verify:

  1. Code Development Requirements:

    • Lean Theorem Prover integration
    • Language model integration for problem translation
    • AlphaZero-based search mechanism
    • Training pipeline for self-improvement
  2. Execution Requirements:

    • Proof generation on sample problems
    • Proof validation using Lean
    • Reinforcement learning training process
  3. Results Match Requirements:

    • Successful formal proof generation
    • Proof verification correctness
    • Performance metrics compared to paper

Checking Running Status & Results

To check the progress of an AlphaProof implementation:

  1. Monitor the runs directory:

    cd /Users/barton/.cursor/infinity-topos/preparedness/project/paperbench
    source .venv/bin/activate
    ls -la runs/[DATE]_run-group_*/alphaproof_*
  2. Examine logs:

    cat runs/[DATE]_run-group_*/group.log
    cat runs/[DATE]_run-group_*/alphaproof_*/run.log
  3. Check results:

    cat runs/[DATE]_run-group_*/alphaproof_*/pb_result.json

Running with AlphaProof Implementation

Execute using the following command:

cd /Users/barton/.cursor/infinity-topos/preparedness/project/paperbench
source .venv/bin/activate
python -m paperbench.nano.entrypoint \
    paperbench.paper_split=debug \
    paperbench.solver=paperbench.nano.eval:ExternalPythonCodingSolver \
    paperbench.solver.agent_id=aisi-basic-agent-openai-dev \
    paperbench.solver.cluster_config=alcatraz.clusters.local:LocalConfig \
    paperbench.solver.cluster_config.image=aisi-basic-agent:latest \
    paperbench.judge.code_only=True \
    runner.recorder=nanoeval.json_recorder:json_recorder

The process will first run the agent to create an implementation, then judge it according to the rubric. Add code_only=True to skip the reproduction step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment