Self-Improving Chain of Thought Pipeline in Python
The SelfImprovingCoTPipeline is a Python class that automates the generation, evaluation, and refinement of reasoning traces for problem-solving. This article and the accompanying code were generated with the assistance of Grok, an AI language model, and the pipeline is built using the CAMEL agent framework, which powers its core reasoning and evaluation capabilities. Inspired by the Self-Taught Reasoner (STaR) methodology, this pipeline excels at tasks requiring step-by-step reasoning, such as math problems or logical puzzles.
What It Does
This pipeline implements a self-improving Chain of Thought (CoT) process with four key steps:
Generate: Produces an initial reasoning trace for a given problem.
Evaluate: Assesses the trace’s quality using an agent or reward model.
Improve: Refines the trace based on feedback if it doesn’t meet quality standards.
Repeat: Iterates until the trace is satisfactory or a maximum iteration limit is reached.
It efficiently handles multiple problems in parallel, leveraging batch processing and threading.
Key Features
Iterative Refinement: Enhances reasoning traces over multiple iterations based on feedback.
Flexible Evaluation: Supports agent-based scoring (correctness, clarity, completeness) or reward model-based evaluation.
Parallel Processing: Processes multiple problems concurrently with dynamic batch sizing.
Output Options: Saves results to a JSON file or returns them as a list.
Customizable: Allows configuration of thresholds, iterations, and templates.
How It Works
Initialization
The pipeline is initialized with:
A reason_agent (a ChatAgent from the CAMEL framework) to generate and improve traces.
A list of problems (dictionaries with at least a "problem" key).
Optional parameters like max_iterations, score_threshold, and an evaluate_agent or reward_model.
Processing
For each problem:
An initial reasoning trace is generated.
The trace is evaluated:
Agent-Based: Scores correctness, clarity, and completeness.
Reward Model: Uses model-specific dimensions.
If the trace falls below the quality threshold, it’s refined using feedback.
Steps 2-3 repeat up to max_iterations or until the threshold is met.
Output
Results include the final trace, evaluation success, and improvement history, optionally saved to a JSON file.
Example Usage
python
from camel.agents import ChatAgent
from self_improving_cot_pipeline import SelfImprovingCoTPipeline
reason_agent = ChatAgent(model="gpt-4")
evaluate_agent = ChatAgent(model="gpt-4")
problems = [
{"id": "1", "problem": "Solve 2x + 3 = 7", "solution": "x = 2"},
{"id": "2", "problem": "Integrate x^2", "solution": "x^3/3 + C"}
]
pipeline = SelfImprovingCoTPipeline(
reason_agent=reason_agent,
problems=problems,
max_iterations=3,
score_threshold=0.8,
evaluate_agent=evaluate_agent,
output_path="results.json"
)
results = pipeline.generate()
Why Use It?
Automation: Streamlines reasoning trace creation and refinement.
Scalability: Efficiently processes large problem sets.dolce
Quality: Ensures high-quality traces through self-improvement.
Conclusion
The SelfImprovingCoTPipeline, built within the CAMEL agent framework and generated with the help of Grok, is a powerful tool for creating robust reasoning traces. Its flexibility and efficiency make it ideal for educational tools, research, or any application needing detailed problem-solving steps. Explore the full implementation for customization options and advanced features! https://github.com/camel-ai/camel/blob/4517742fcb25acbf2931d5fc076cecee4cbfb183/camel/datagen/self_improving_cot.py