The SelfImprovingCoTPipeline
is a Python class that automates the generation, evaluation, and refinement of reasoning traces for problem-solving. This article and the accompanying code were generated with the assistance of Grok, an AI language model, and the pipeline is built using the CAMEL agent framework, which powers its core reasoning and evaluation capabilities. Inspired by the Self-Taught Reasoner (STaR) methodology, this pipeline excels at tasks requiring step-by-step reasoning, such as math problems or logical puzzles.
This pipeline implements a self-improving Chain of Thought (CoT) process with four key steps:
-
Generate: Produces an initial reasoning trace for a given problem.
-
Evaluate: Assesses the trace’s quality using an agent or reward model.
-
Improve: Refines the trace based on feedback if it doesn’t meet quality standards.
-
Repeat: Iterates until the trace is satisfactory or a maximum iteration limit is reached.
It efficiently handles multiple problems in parallel, leveraging batch processing and threading.
-
Iterative Refinement: Enhances reasoning traces over multiple iterations based on feedback.
-
Flexible Evaluation: Supports agent-based scoring (correctness, clarity, completeness) or reward model-based evaluation.
-
Parallel Processing: Processes multiple problems concurrently with dynamic batch sizing.
-
Output Options: Saves results to a JSON file or returns them as a list.
-
Customizable: Allows configuration of thresholds, iterations, and templates.
The pipeline is initialized with:
-
A
reason_agent
(aChatAgent
from the CAMEL framework) to generate and improve traces. -
A list of problems (dictionaries with at least a "problem" key).
-
Optional parameters like
max_iterations
,score_threshold
, and anevaluate_agent
orreward_model
.
For each problem:
-
An initial reasoning trace is generated.
-
The trace is evaluated:
-
Agent-Based: Scores correctness, clarity, and completeness.
-
Reward Model: Uses model-specific dimensions.
-
-
If the trace falls below the quality threshold, it’s refined using feedback.
-
Steps 2-3 repeat up to
max_iterations
or until the threshold is met.
Results include the final trace, evaluation success, and improvement history, optionally saved to a JSON file.
from camel.agents import ChatAgent
from self_improving_cot_pipeline import SelfImprovingCoTPipeline
# Define agents using the CAMEL framework
reason_agent = ChatAgent(model="gpt-4")
evaluate_agent = ChatAgent(model="gpt-4")
# Define problems
problems = [
{"id": "1", "problem": "Solve 2x + 3 = 7", "solution": "x = 2"},
{"id": "2", "problem": "Integrate x^2", "solution": "x^3/3 + C"}
]
# Initialize pipeline
pipeline = SelfImprovingCoTPipeline(
reason_agent=reason_agent,
problems=problems,
max_iterations=3,
score_threshold=0.8,
evaluate_agent=evaluate_agent,
output_path="results.json"
)
# Generate traces
results = pipeline.generate()