-
Create Paper Directory Structure:
mkdir -p /Users/barton/.cursor/infinity-topos/preparedness/project/paperbench/data/papers/alphaproof/{assets,judge}
-
Create Required Files:
config.yaml
- Basic paper infopaper.md
- The paper content (markdown form)paper.pdf
- The paper in PDFrubric.json
- Evaluation criteriaaddendum.md
- Important implementation detailsblacklist.txt
- Websites that shouldn't be accessedjudge.addendum.md
- Notes for the judge
-
Update Paper Splits:
echo "alphaproof" >> /Users/barton/.cursor/infinity-topos/preparedness/project/paperbench/experiments/splits/debug.txt
AlphaProof requires several key components to replicate:
-
Formal Verification System:
- Set up Lean theorem prover environment (mentioned in DeepMind paper)
- Configure integration with language models
-
Reinforcement Learning Components:
- AlphaZero-like training process
- Self-improvement cycle for proof generation
-
Data Processing Pipeline:
- Paper mentions automatic translation of natural language to formal statements
- Need formalized problem system
The rubric should verify:
-
Code Development Requirements:
- Lean Theorem Prover integration
- Language model integration for problem translation
- AlphaZero-based search mechanism
- Training pipeline for self-improvement
-
Execution Requirements:
- Proof generation on sample problems
- Proof validation using Lean
- Reinforcement learning training process
-
Results Match Requirements:
- Successful formal proof generation
- Proof verification correctness
- Performance metrics compared to paper
To check the progress of an AlphaProof implementation:
-
Monitor the runs directory:
cd /Users/barton/.cursor/infinity-topos/preparedness/project/paperbench source .venv/bin/activate ls -la runs/[DATE]_run-group_*/alphaproof_*
-
Examine logs:
cat runs/[DATE]_run-group_*/group.log cat runs/[DATE]_run-group_*/alphaproof_*/run.log
-
Check results:
cat runs/[DATE]_run-group_*/alphaproof_*/pb_result.json
Execute using the following command:
cd /Users/barton/.cursor/infinity-topos/preparedness/project/paperbench
source .venv/bin/activate
python -m paperbench.nano.entrypoint \
paperbench.paper_split=debug \
paperbench.solver=paperbench.nano.eval:ExternalPythonCodingSolver \
paperbench.solver.agent_id=aisi-basic-agent-openai-dev \
paperbench.solver.cluster_config=alcatraz.clusters.local:LocalConfig \
paperbench.solver.cluster_config.image=aisi-basic-agent:latest \
paperbench.judge.code_only=True \
runner.recorder=nanoeval.json_recorder:json_recorder
The process will first run the agent to create an implementation, then judge it according to the rubric. Add code_only=True
to skip the reproduction step.