Skip to content

Instantly share code, notes, and snippets.

@chunhualiao
Last active August 21, 2025 00:19
Show Gist options
  • Save chunhualiao/571dd392121b6f7439cdab742063134c to your computer and use it in GitHub Desktop.
Save chunhualiao/571dd392121b6f7439cdab742063134c to your computer and use it in GitHub Desktop.
Evolving Two Sum: From O(n²) to O(n) with OpenEvolve

I spent today experimenting with OpenEvolve (the open-source counterpart to DeepMind’s AlphaEvolve) on a RHEL 9 laptop using the OpenAI API. To get it running, I fixed a small hashing issue—swapping hashlib.md5 for hashlib.sha256 due to Red Hat constraints—and added support for gpt-5, modeling it after the other o-series entries.

For a test task, I used the classic Two Sum Python problem (scan a list and find two numbers that add up to a target). With the default settings, OpenEvolve failed to reach the optimal solution and declared the O(n²) brute-force approach as the best.

Digging into the logs (and checking with AI) revealed that evaluator.py sets the optimization objective. The defaults overweight correctness and rely on overly simple tests. I expanded the test suite with edge cases and larger inputs, and rebalanced the objective to 50/50 for correctness and efficiency (instead of 70/30). Last but not least, I enabled full rewrite (not the default diff-based evolution).

After rerunning, OpenEvolve discovered the optimal O(n) solution in a few iterations. My takeaway: the hard part is crafting the evaluator—the tests and weightings heavily steer outcomes. I also noticed it starts from an initial program rather than generating from scratch, which I hadn’t expected.

Overall, in 1–2 hours I got OpenEvolve working and producing useful results. I’m pleased with the progress and plan to keep exploring; it’s a very promising open-source project.

@chunhualiao
Copy link
Author

# Configuration for LeetCode-style problem solving  
max_iterations: 50  
checkpoint_interval: 10  
log_level: "INFO"  
random_seed: 42
  
# LLM configuration  
llm:  
  primary_model: "gpt-4o"
  primary_model_weight: 0.8
  secondary_model: "gpt-5"
  secondary_model_weight: 0.2   
    
  api_base: "https://api.openai.com/v1"  
  temperature: 0.7
  top_p: 0.95
  max_tokens: 2048  
  timeout: 30  
  
# Prompt configuration  
prompt:  
  system_message: "You are an expert competitive programmer. Optimize the given code for correctness and efficiency."  
  num_top_programs: 3  
  num_diverse_programs: 2  
  use_template_stochasticity: true  
  
# Database configuration  
database:  
  population_size: 50  
  archive_size: 20  
  num_islands: 3  
  migration_interval: 20  
  migration_rate: 0.1  
    
  # Selection parameters  
  elite_selection_ratio: 0.2  
  exploration_ratio: 0.3  
  exploitation_ratio: 0.5  
  
# Evaluator configuration  
evaluator:  
  timeout: 60  
  max_retries: 2  
  parallel_evaluations: 2  
  cascade_evaluation: false  
  cascade_thresholds: [0.3, 0.6, 0.8]  
  enable_artifacts: true
  use_llm_feedback: false

# Evolution settings
# The allow_full_rewrites setting appears to be a legacy configuration option that has been 
# replaced by diff_based_evolution with inverted logic 
# (when diff_based_evolution is true, full rewrites are disabled).
diff_based_evolution: false

@chunhualiao
Copy link
Author

import time  
import traceback  
import random
from openevolve.evaluation_result import EvaluationResult  
  
def evaluate(program_path):  
    """  
    Evaluate the program by running test cases and measuring performance  
    """  
    try:  
        # Import the program  
        import importlib.util  
        import sys  
          
        spec = importlib.util.spec_from_file_location("solution", program_path)  
        module = importlib.util.module_from_spec(spec)  
        sys.modules["solution"] = module  
        spec.loader.exec_module(module)  
          
        if not hasattr(module, 'two_sum'):  
            return EvaluationResult(  
                metrics={"score": 0.0, "correctness": 0.0, "efficiency": 0.0},  
                artifacts={"error": "Function 'two_sum' not found"}  
            )  
          
        # Test cases - including edge cases and larger performance tests
        test_cases = [
            # Basic test cases
            ([2, 7, 11, 15], 9, [0, 1]),  
            ([3, 2, 4], 6, [1, 2]),  
            ([3, 3], 6, [0, 1]),  
            ([1, 2, 3, 4, 5], 8, [2, 4]),
            
            # Edge cases
            ([], 0, []),  # Empty array
            ([5], 5, []),  # Single element (no solution)
            ([1, 3], 5, []),  # No solution
            ([-1, -2, -3, -4], -7, [2, 3]),  # Negative numbers
            ([0, 0], 0, [0, 1]),  # Zeros
            
            # Medium-sized test
            (list(range(1000)), 999, [0, 999]),  # Solution at the end
            (list(range(1000)), 500, [0, 500]),  # Solution in the middle
            (list(range(1000)), 1, [0, 1]),  # Solution at the beginning
            
            # Large performance tests
            (list(range(10000)), 9999, [0, 9999]),  # 10K elements
            (list(range(-5000, 5000)), 0, [5000, 5000]),  # 10K elements with negatives
            
            # Very large test with random data
            (random.sample(range(-100000, 100000), 20000), 42, None)  # 20K random elements
        ]
        
        # For the last test case, find two numbers that sum to 42
        nums = test_cases[-1][0]
        target = 42
        solution_exists = False
        
        # Find a valid solution in the random array (if one exists)
        num_set = set()
        for i, num in enumerate(nums):
            complement = target - num
            if complement in num_set:
                for j, val in enumerate(nums):
                    if val == complement and j != i:
                        test_cases[-1] = (nums, target, [j, i])
                        solution_exists = True
                        break
                if solution_exists:
                    break
            num_set.add(num)
        
        # If no solution exists, create one
        if not solution_exists:
            i = random.randint(0, len(nums) - 1)
            j = random.randint(0, len(nums) - 1)
            while j == i:
                j = random.randint(0, len(nums) - 1)
            
            nums[i] = 20
            nums[j] = 22
            test_cases[-1] = (nums, target, [i, j])
          
        correctness_score = 0  
        total_time = 0  
        test_results = []
          
        for i, (nums, target, expected) in enumerate(test_cases):  
            try:  
                start_time = time.time()  
                result = module.two_sum(nums.copy(), target)  
                end_time = time.time()  
                  
                execution_time = end_time - start_time  
                total_time += execution_time  
                  
                # Check correctness
                is_correct = False
                if expected == []:  # No solution expected
                    is_correct = (result == [] or result == None or len(result) == 0)
                elif (isinstance(result, list) and len(result) == 2 and   
                    0 <= result[0] < len(nums) and 0 <= result[1] < len(nums) and  
                    result[0] != result[1] and  
                    nums[result[0]] + nums[result[1]] == target):
                    is_correct = True
                
                if is_correct:
                    correctness_score += 1
                    test_results.append(f"Test {i}: PASS ({execution_time:.6f}s)")
                else:
                    test_results.append(f"Test {i}: FAIL - Expected valid indices, got {result} ({execution_time:.6f}s)")
                      
            except Exception as e:  
                test_results.append(f"Test {i}: ERROR - {str(e)}")
                return EvaluationResult(  
                    metrics={"score": 0.0, "correctness": 0.0, "efficiency": 0.0},  
                    artifacts={
                        "error": f"Test case {i} failed: {str(e)}",
                        "test_results": "\n".join(test_results)
                    }  
                )  
          
        # Calculate metrics  
        correctness = correctness_score / len(test_cases)  
          
        # Efficiency score (lower time is better, max 10 seconds)
        # More aggressive scaling for large inputs
        efficiency = max(0, 1 - (total_time / 5.0))  
          
        # Combined score - equal weighting (50/50)
        score = 0.5 * correctness + 0.5 * efficiency  
          
        return EvaluationResult(  
            metrics={  
                "combined_score":score,                
                "correctness": correctness,  
                "efficiency": efficiency,  
                "total_time": total_time  
            },  
            artifacts={  
                "test_results": f"Passed {correctness_score}/{len(test_cases)} tests\n" + "\n".join(test_results),  
                "execution_time": f"{total_time:.6f} seconds"  
            }  
        )  
          
    except Exception as e:  
        return EvaluationResult(  
            metrics={"combined_score": 0.0, "correctness": 0.0, "efficiency": 0.0},  
            artifacts={  
                "error": str(e),  
                "traceback": traceback.format_exc()  
            }  
        )

@chunhualiao
Copy link
Author

patch to work with redhat 9 and GPT-5

diff --git a/openevolve/controller.py b/openevolve/controller.py
index 8100277..f55db0a 100644
--- a/openevolve/controller.py
+++ b/openevolve/controller.py
@@ -107,7 +107,7 @@ class OpenEvolve:
 
             # Create hash-based seeds for different components
             base_seed = str(self.config.random_seed).encode("utf-8")
-            llm_seed = int(hashlib.md5(base_seed + b"llm").hexdigest()[:8], 16) % (2**31)
+            llm_seed = int(hashlib.sha256(base_seed + b"llm").hexdigest()[:8], 16) % (2**31)
 
             # Propagate seed to LLM configurations
             self.config.llm.random_seed = llm_seed
diff --git a/openevolve/llm/openai.py b/openevolve/llm/openai.py
index d396dd1..77f04a8 100644
--- a/openevolve/llm/openai.py
+++ b/openevolve/llm/openai.py
@@ -66,8 +66,10 @@ class OpenAILLM(LLMInterface):
         formatted_messages.extend(messages)
 
         # Set up generation parameters
-        if self.api_base == "https://api.openai.com/v1" and str(self.model).lower().startswith("o"):
-            # For o-series models
+        model_lower = str(self.model).lower()
+        # Check if model is o-series or gpt-5, which require max_completion_tokens
+        if self.api_base == "https://api.openai.com/v1" and (model_lower.startswith("o") or model_lower.startswith("gpt-5")):
+            # For o-series models and gpt-5
             params = {
                 "model": self.model,
                 "messages": formatted_messages,

@chunhualiao
Copy link
Author

chunhualiao commented Aug 21, 2025

a log file

Program ID: cc0f70ed-2f47-43e2-9587-0b90b1321546
Checkpoint: examples/my_leetcode_program/openevolve_output/checkpoints/checkpoint_30
Island: 2
Generation: 3
Parent ID: 43636357-2231-4937-94e6-51d8f30e1e32
Metrics:
combined_score: 0.9997711896896362
correctness: 1.0
efficiency: 0.9995423793792725
total_time: 0.0022881031036376953
Code:
def two_sum(nums, target):
"""
Given an array of integers nums and an integer target,
return indices of the two numbers such that they add up to target.
"""
# Dictionary to map numbers to their indices
num_to_index = {}

for index, num in enumerate(nums):
    # Calculate the needed complement to reach the target
    complement = target - num
    
    # Check if the complement is already recorded in the dictionary
    if complement in num_to_index:
        return [num_to_index[complement], index]
    
    # Record the current number and its index in the dictionary
    num_to_index[num] = index

# A solution is guaranteed to exist, so this line is never reached
return []

if name == "main":
# Example usage
nums = [2, 7, 11, 15]
target = 9
result = two_sum(nums, target)
print(f"Result: {result}")
Prompts:
Full_rewrite_user
system:
You are an expert competitive programmer. Optimize the given code for correctness and efficiency.
user:

Current Program Information

  • Fitness: 0.9999
  • Feature coordinates: No feature coordinates
  • Focus areas: - Fitness improved: 0.9999 → 0.9999
  • Consider simplifying - code length exceeds 500 characters

Last Execution Output

test_results

Passed 15/15 tests
Test 0: PASS (0.000002s)
Test 1: PASS (0.000001s)
Test 2: PASS (0.000000s)
Test 3: PASS (0.000001s)
Test 4: PASS (0.000000s)
Test 5: PASS (0.000001s)
Test 6: PASS (0.000000s)
Test 7: PASS (0.000001s)
Test 8: PASS (0.000000s)
Test 9: PASS (0.000035s)
Test 10: PASS (0.000017s)
Test 11: PASS (0.000002s)
Test 12: PASS (0.000329s)
Test 13: PASS (0.000380s)
Test 14: PASS (0.000107s)

execution_time

0.000877 seconds

Program Evolution History

Previous Attempts

Attempt 3

  • Changes: Full rewrite
  • Metrics: combined_score: 0.9999, correctness: 1.0000, efficiency: 0.9998, total_time: 0.0010
  • Outcome: Mixed results

Attempt 2

  • Changes: Unknown changes
  • Metrics: combined_score: 0.9999, correctness: 1.0000, efficiency: 0.9998, total_time: 0.0009
  • Outcome: Improvement in all metrics

Attempt 1

  • Changes: Full rewrite
  • Metrics: combined_score: 0.9999, correctness: 1.0000, efficiency: 0.9998, total_time: 0.0008
  • Outcome: Mixed results

Top Performing Programs

Program 1 (Score: 0.9999)

def two_sum(nums, target):
    """
    Given an array of integers nums and an integer target,
    return indices of the two numbers such that they add up to target.
    """
    # Create a dictionary to store the potential matches
    num_to_index = {}
    
    # Iterate over each number in the list along with its index
    for index, num in enumerate(nums):
        # Calculate the complement that, when added to the current number, equals the target
        complement = target - num
        
        # Check if the complement is already in our dictionary
        if complement in num_to_index:
            # If it is, return the indices of the complement and the current number
            return [num_to_index[complement], index]
        
        # Otherwise, store the index of the current number in the dictionary
        # This allows us to check for complements in future iterations
        num_to_index[num] = index
    
    # Return an empty list if no solution is found, though per problem description, a solution always exists
    return []

if __name__ == "__main__":
    # Example usage
    nums = [2, 7, 11, 15]
    target = 9
    result = two_sum(nums, target)
    print(f"Result: {result}")

Key features: Performs well on combined_score (0.9999), Performs well on correctness (1.0000), Performs well on efficiency (0.9998), Performs well on total_time (0.0008)

Program 2 (Score: 0.9999)

def two_sum(nums, target):
    """
    Given an array of integers nums and an integer target,
    return indices of the two numbers such that they add up to target.
    """
    num_to_index = {}
    
    for index, num in enumerate(nums):
        complement = target - num
        if complement in num_to_index:
            return [num_to_index[complement], index]
        num_to_index[num] = index
    
    return []

if __name__ == "__main__":
    # Example usage
    nums = [2, 7, 11, 15]
    target = 9
    result = two_sum(nums, target)
    print(f"Result: {result}")

Key features: Performs well on combined_score (0.9999), Performs well on correctness (1.0000), Performs well on efficiency (0.9998), Performs well on total_time (0.0009)

Program 3 (Score: 0.9999)

def two_sum(nums, target):
    seen = {}
    get = seen.get
    for i, x in enumerate(nums):
        j = get(target - x)
        if j is not None:
            return [j, i]
        seen[x] = i
    return []

if __name__ == "__main__":
    nums = [2, 7, 11, 15]
    target = 9
    print(f"Result: {two_sum(nums, target)}")

Key features: Performs well on combined_score (0.9999), Performs well on correctness (1.0000), Performs well on efficiency (0.9998), Performs well on total_time (0.0010)

Diverse Programs

Program D1 (Score: 0.9999)

def two_sum(nums, target):
    """
    Given an array of integers nums and an integer target,
    return indices of the two numbers such that they add up to target.
    """
    num_to_index = {}
    
    for i, num in enumerate(nums):
        complement = target - num
        if complement in num_to_index:
            return [num_to_index[complement], i]
        num_to_index[num] = i
    
    # According to the problem statement, there is always exactly one solution,
    # so this return statement is technically not necessary.
    return []

if __name__ == "__main__":
    # Example usage
    nums = [2, 7, 11, 15]
    target = 9
    result = two_sum(nums, target)
    print(f"Result: {result}")

Key features: Alternative approach to combined_score, Alternative approach to correctness

Inspiration Programs

These programs represent diverse approaches and creative solutions that may inspire new ideas:

Inspiration 1 (Score: 0.9999, Type: High-Performer)

def two_sum(nums, target):
    """
    Given an array of integers nums and an integer target,
    return indices of the two numbers such that they add up to target.
    """
    # Create a dictionary to store the potential matches
    num_to_index = {}
    
    # Iterate over each number in the list along with its index
    for index, num in enumerate(nums):
        # Calculate the complement that, when added to the current number, equals the target
        complement = target - num
        
        # Check if the complement is already in our dictionary
        if complement in num_to_index:
            # If it is, return the indices of the complement and the current number
            return [num_to_index[complement], index]
        
        # Otherwise, store the index of the current number in the dictionary
        # This allows us to check for complements in future iterations
        num_to_index[num] = index
    
    # Return an empty list if no solution is found, though per problem description, a solution always exists
    return []

if __name__ == "__main__":
    # Example usage
    nums = [2, 7, 11, 15]
    target = 9
    result = two_sum(nums, target)
    print(f"Result: {result}")

Unique approach: Modification: Full rewrite, Excellent combined_score (1.000), Excellent correctness (1.000)

Inspiration 2 (Score: 0.9999, Type: High-Performer)

def two_sum(nums, target):
    """
    Given an array of integers nums and an integer target,
    return indices of the two numbers such that they add up to target.
    """
    num_to_index = {}
    
    for i, num in enumerate(nums):
        complement = target - num
        if complement in num_to_index:
            return [num_to_index[complement], i]
        num_to_index[num] = i
    
    # According to the problem statement, there is always exactly one solution,
    # so this return statement is technically not necessary.
    return []

if __name__ == "__main__":
    # Example usage
    nums = [2, 7, 11, 15]
    target = 9
    result = two_sum(nums, target)
    print(f"Result: {result}")

Unique approach: Modification: Full rewrite, Excellent combined_score (1.000), Excellent correctness (1.000)

Inspiration 3 (Score: 0.9999, Type: High-Performer)

def two_sum(nums, target):
    seen = {}
    get = seen.get
    for i, x in enumerate(nums):
        j = get(target - x)
        if j is not None:
            return [j, i]
        seen[x] = i
    return []

if __name__ == "__main__":
    nums = [2, 7, 11, 15]
    target = 9
    print(f"Result: {two_sum(nums, target)}")

Unique approach: Modification: Full rewrite, Excellent combined_score (1.000), Excellent correctness (1.000)

Current Program

def two_sum(nums, target):
    """
    Given an array of integers nums and an integer target,
    return indices of the two numbers such that they add up to target.
    """
    num_to_index = {}
    
    for index, num in enumerate(nums):
        complement = target - num
        if complement in num_to_index:
            return [num_to_index[complement], index]
        num_to_index[num] = index
    
    return []

if __name__ == "__main__":
    # Example usage
    nums = [2, 7, 11, 15]
    target = 9
    result = two_sum(nums, target)
    print(f"Result: {result}")

Task

Rewrite the program to improve its FITNESS SCORE.
The system maintains diversity across these dimensions: complexity, diversity
Different solutions with similar fitness but different features are valuable.
Provide the complete new program code.

IMPORTANT: Make sure your rewritten program maintains the same inputs and outputs
as the original program, but with improved internal implementation.

# Your rewritten program here

responses:
The current program is already highly optimized in terms of correctness and efficiency for the "Two Sum" problem. It uses a hash map to achieve a time complexity of O(n), which is optimal for this problem. Given the constraints and the existing performance, there isn't much room for improvement in terms of raw speed or complexity.

However, we can explore slight variations or stylistic improvements that might be beneficial in other contexts, such as readability or maintainability. Here's a version that focuses on using Python's dictionary methods more explicitly and ensuring the code is concise and clear:

def two_sum(nums, target):
    """
    Given an array of integers nums and an integer target,
    return indices of the two numbers such that they add up to target.
    """
    # Dictionary to map numbers to their indices
    num_to_index = {}
    
    for index, num in enumerate(nums):
        # Calculate the needed complement to reach the target
        complement = target - num
        
        # Check if the complement is already recorded in the dictionary
        if complement in num_to_index:
            return [num_to_index[complement], index]
        
        # Record the current number and its index in the dictionary
        num_to_index[num] = index

    # A solution is guaranteed to exist, so this line is never reached
    return []

if __name__ == "__main__":
    # Example usage
    nums = [2, 7, 11, 15]
    target = 9
    result = two_sum(nums, target)
    print(f"Result: {result}")

Key Points:

  • Clarity and Conciseness: The logic is kept clear and concise, with comments explaining each step.
  • Efficiency: The use of a dictionary ensures that lookups and insertions are O(1) on average, maintaining optimal efficiency.
  • Correctness: The function adheres to the problem constraints, ensuring that a solution is always found if it exists.

This solution maintains the same performance characteristics as the original while ensuring that the code is straightforward and easy to read.
Artifacts:
{"test_results": "Passed 15/15 tests\nTest 0: PASS (0.000004s)\nTest 1: PASS (0.000003s)\nTest 2: PASS (0.000001s)\nTest 3: PASS (0.000002s)\nTest 4: PASS (0.000002s)\nTest 5: PASS (0.000001s)\nTest 6: PASS (0.000001s)\nTest 7: PASS (0.000003s)\nTest 8: PASS (0.000001s)\nTest 9: PASS (0.000089s)\nTest 10: PASS (0.000043s)\nTest 11: PASS (0.000005s)\nTest 12: PASS (0.000824s)\nTest 13: PASS (0.000918s)\nTest 14: PASS (0.000391s)", "execution_time": "0.002288 seconds"}

@chunhualiao
Copy link
Author

Screenshot 2025-08-20 at 5 08 10 PM Screenshot 2025-08-20 at 5 08 35 PM Screenshot 2025-08-20 at 5 08 56 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment