How to Set Up AlphaProof in PaperBench

Steps to Create and Adapt a New Paper

Create Paper Directory Structure:

mkdir -p /Users/barton/.cursor/infinity-topos/preparedness/project/paperbench/data/papers/alphaproof/{assets,judge}

Create Required Files:
- config.yaml - Basic paper info
- paper.md - The paper content (markdown form)
- paper.pdf - The paper in PDF
- rubric.json - Evaluation criteria
- addendum.md - Important implementation details
- blacklist.txt - Websites that shouldn't be accessed
- judge.addendum.md - Notes for the judge

Update Paper Splits:

echo "alphaproof" >> /Users/barton/.cursor/infinity-topos/preparedness/project/paperbench/experiments/splits/debug.txt

AlphaProof-Specific Components

Core AlphaProof Implementation

AlphaProof requires several key components to replicate:

Formal Verification System:
- Set up Lean theorem prover environment (mentioned in DeepMind paper)
- Configure integration with language models
Reinforcement Learning Components:
- AlphaZero-like training process
- Self-improvement cycle for proof generation
Data Processing Pipeline:
- Paper mentions automatic translation of natural language to formal statements
- Need formalized problem system

Rubric Design

The rubric should verify:

Code Development Requirements:
- Lean Theorem Prover integration
- Language model integration for problem translation
- AlphaZero-based search mechanism
- Training pipeline for self-improvement
Execution Requirements:
- Proof generation on sample problems
- Proof validation using Lean
- Reinforcement learning training process
Results Match Requirements:
- Successful formal proof generation
- Proof verification correctness
- Performance metrics compared to paper

Checking Running Status & Results

To check the progress of an AlphaProof implementation:

Monitor the runs directory:

cd /Users/barton/.cursor/infinity-topos/preparedness/project/paperbench
source .venv/bin/activate
ls -la runs/[DATE]_run-group_*/alphaproof_*

Examine logs:

cat runs/[DATE]_run-group_*/group.log
cat runs/[DATE]_run-group_*/alphaproof_*/run.log

Check results:

cat runs/[DATE]_run-group_*/alphaproof_*/pb_result.json

Running with AlphaProof Implementation

Execute using the following command:

cd /Users/barton/.cursor/infinity-topos/preparedness/project/paperbench
source .venv/bin/activate
python -m paperbench.nano.entrypoint \
    paperbench.paper_split=debug \
    paperbench.solver=paperbench.nano.eval:ExternalPythonCodingSolver \
    paperbench.solver.agent_id=aisi-basic-agent-openai-dev \
    paperbench.solver.cluster_config=alcatraz.clusters.local:LocalConfig \
    paperbench.solver.cluster_config.image=aisi-basic-agent:latest \
    paperbench.judge.code_only=True \
    runner.recorder=nanoeval.json_recorder:json_recorder

The process will first run the agent to create an implementation, then judge it according to the rubric. Add code_only=True to skip the reproduction step.

bmorphism/artificial.proof.intelligence.md