Self-Improving Chain of Thought Pipeline in Python

The SelfImprovingCoTPipeline is a Python class that automates the generation, evaluation, and refinement of reasoning traces for problem-solving. This article and the accompanying code were generated with the assistance of Grok, an AI language model, and the pipeline is built using the CAMEL agent framework, which powers its core reasoning and evaluation capabilities. Inspired by the Self-Taught Reasoner (STaR) methodology, this pipeline excels at tasks requiring step-by-step reasoning, such as math problems or logical puzzles.

What It Does

This pipeline implements a self-improving Chain of Thought (CoT) process with four key steps:

Generate: Produces an initial reasoning trace for a given problem.
Evaluate: Assesses the trace’s quality using an agent or reward model.
Improve: Refines the trace based on feedback if it doesn’t meet quality standards.
Repeat: Iterates until the trace is satisfactory or a maximum iteration limit is reached.

It efficiently handles multiple problems in parallel, leveraging batch processing and threading.

Key Features

Iterative Refinement: Enhances reasoning traces over multiple iterations based on feedback.
Flexible Evaluation: Supports agent-based scoring (correctness, clarity, completeness) or reward model-based evaluation.
Parallel Processing: Processes multiple problems concurrently with dynamic batch sizing.
Output Options: Saves results to a JSON file or returns them as a list.
Customizable: Allows configuration of thresholds, iterations, and templates.

How It Works

Initialization

The pipeline is initialized with:

A reason_agent (a ChatAgent from the CAMEL framework) to generate and improve traces.
A list of problems (dictionaries with at least a "problem" key).
Optional parameters like max_iterations, score_threshold, and an evaluate_agent or reward_model.

Processing

For each problem:

An initial reasoning trace is generated.
The trace is evaluated:
- Agent-Based: Scores correctness, clarity, and completeness.
- Reward Model: Uses model-specific dimensions.
If the trace falls below the quality threshold, it’s refined using feedback.
Steps 2-3 repeat up to max_iterations or until the threshold is met.

Output

Results include the final trace, evaluation success, and improvement history, optionally saved to a JSON file.

Example Usage

from camel.agents import ChatAgent

from self_improving_cot_pipeline import SelfImprovingCoTPipeline

# Define agents using the CAMEL framework

reason_agent = ChatAgent(model="gpt-4")

evaluate_agent = ChatAgent(model="gpt-4")

# Define problems

problems = [

    {"id": "1", "problem": "Solve 2x + 3 = 7", "solution": "x = 2"},

    {"id": "2", "problem": "Integrate x^2", "solution": "x^3/3 + C"}

]

# Initialize pipeline

pipeline = SelfImprovingCoTPipeline(

    reason_agent=reason_agent,

    problems=problems,

    max_iterations=3,

    score_threshold=0.8,

    evaluate_agent=evaluate_agent,

    output_path="results.json"

)

# Generate traces

results = pipeline.generate()

tom-doerr/Main.md