I spent today experimenting with OpenEvolve (the open-source counterpart to DeepMind’s AlphaEvolve) on a RHEL 9 laptop using the OpenAI API. To get it running, I fixed a small hashing issue—swapping hashlib.md5
for hashlib.sha256
due to Red Hat constraints—and added support for gpt-5, modeling it after the other o-series entries.
For a test task, I used the classic Two Sum Python problem (scan a list and find two numbers that add up to a target). With the default settings, OpenEvolve failed to reach the optimal solution and declared the O(n²) brute-force approach as the best.
Digging into the logs (and checking with AI) revealed that evaluator.py
sets the optimization objective. The defaults overweight correctness and rely on overly simple tests. I expanded the test suite with edge cases and larger inputs, and rebalanced the objective to 50/50 for correctness and efficiency (instead of 70/30). Last but not least, I enabled full rewrite (not the default diff-based evolution).
After rerunning, OpenEvolve discovered the optimal O(n) solution in a few iterations. My takeaway: the hard part is crafting the evaluator—the tests and weightings heavily steer outcomes. I also noticed it starts from an initial program rather than generating from scratch, which I hadn’t expected.
Overall, in 1–2 hours I got OpenEvolve working and producing useful results. I’m pleased with the progress and plan to keep exploring; it’s a very promising open-source project.