In this tutorial, we optimize GPT-4.1 Mini's Chain of Thought (dspy.ChainOfThought) for generating profitable trading strategies using the dspy.GEPA optimizer! We demonstrate how to evolve prompts for different strategy themes (momentum, mean reversion, breakout, arbitrage, volume) with proper risk management.
This implementation realizes Kagen Atkinson's vision of an autonomous LLM-powered trading system with:
- Theme-based Specialization:
--theme
flag for different strategy types (as Kagen intended) - Offline Research Loop (
test_gepa_enhanced.py
): GEPA evolves theme-specific prompts through reflective optimization - Deterministic Execution (
run_gepa_trading.py
): Uses evolved prompts for strategy generation and backtesting - Complete Separation: Research and execution remain isolated with no LLM calls in live trading
import argparse
import dspy
import numpy as np
import pandas as pd
from pathlib import Path
import pickle
# Configure LLM
lm = dspy.LM("openai/gpt-4.1-mini", temperature=0.7, max_tokens=4000)
dspy.configure(lm=lm)
# Strategy themes configuration
STRATEGY_THEMES = {
'momentum': {
'description': 'RSI-EMA momentum strategies',
'indicators': ['rsi', 'ema', 'macd'],
'target_sharpe': 2.0,
'target_win_rate': 0.55
},
'mean_reversion': {
'description': 'Bollinger Bands and RSI mean reversion',
'indicators': ['bb', 'rsi', 'stoch'],
'target_sharpe': 1.8,
'target_win_rate': 0.60
},
'breakout': {
'description': 'Support/resistance and volume breakout',
'indicators': ['atr', 'volume', 'donchian'],
'target_sharpe': 2.2,
'target_win_rate': 0.45
},
'arbitrage': {
'description': 'Statistical arbitrage and pairs trading',
'indicators': ['correlation', 'zscore', 'cointegration'],
'target_sharpe': 3.0,
'target_win_rate': 0.65
},
'volume': {
'description': 'Volume-based and order flow strategies',
'indicators': ['vwap', 'obv', 'volume_profile'],
'target_sharpe': 1.9,
'target_win_rate': 0.52
}
}
We use 1-minute BTCUSDT data for high-frequency strategy development:
def load_btc_data():
"""Load BTC/USDT 1-minute data for backtesting"""
data_path = Path("data/btcusdt_1m_2024.csv")
df = pd.read_csv(data_path)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp')
# Calculate returns for metrics
df['returns'] = df['close'].pct_change()
return df
# Load data
btc_data = load_btc_data()
print(f"Loaded {len(btc_data)} candles from {btc_data.index[0]} to {btc_data.index[-1]}")
The core module generates RSI-EMA momentum strategies with risk management:
class TradingStrategySignature(dspy.Signature):
"""Generate a profitable RSI-EMA momentum trading strategy"""
market_context: str = dspy.InputField(
desc="Current market conditions and data characteristics"
)
strategy: str = dspy.OutputField(
desc="Complete trading strategy with entry/exit rules and risk management"
)
class TradingStrategyModule(dspy.Module):
def __init__(self):
self.generate_strategy = dspy.ChainOfThought(TradingStrategySignature)
def forward(self, market_context):
# Generate strategy using evolved prompt
result = self.generate_strategy(market_context=market_context)
return dspy.Prediction(strategy=result.strategy)
GEPA supports multi-objective optimization without collapsing metrics into a single scalar. This enables true Pareto-optimal solutions:
from dspy.evaluate.evaluate import ScoreWithFeedback
import numpy as np
def create_pareto_metric(theme='momentum'):
"""
Multi-objective metric that preserves individual objectives for Pareto optimization.
GEPA will maintain a Pareto frontier of non-dominated solutions.
"""
# Get theme configuration
theme_config = STRATEGY_THEMES.get(theme, STRATEGY_THEMES['momentum'])
target_sharpe = theme_config['target_sharpe']
target_win_rate = theme_config['target_win_rate']
def metric(gold: Example, pred, trace=None, pred_name=None, pred_trace=None):
try:
# Parse and validate strategy
strategy = pred.strategy
# Run backtest with walk-forward validation
results = backtest_with_validation(
strategy,
validation_method='walk_forward', # Proper out-of-sample testing
purge_embargo=True, # Purged cross-validation
realistic_costs=True # Include slippage, commissions, market impact
)
# Extract individual objectives (DO NOT COLLAPSE)
sharpe = results.get('sharpe_ratio', 0)
win_rate = results.get('win_rate', 0)
max_dd = abs(results.get('max_drawdown', 0))
profit_factor = results.get('profit_factor', 0)
# Create multi-dimensional score vector for Pareto optimization
# GEPA will use this to maintain non-dominated solutions
score_vector = {
'sharpe': sharpe / target_sharpe, # Normalized objectives
'win_rate': win_rate / target_win_rate,
'drawdown': 1 - max_dd, # Higher is better
'profit_factor': profit_factor / 2.0
}
# For GEPA's Pareto selection, use the minimum across objectives
# This ensures no single metric dominates
pareto_score = min(score_vector.values())
# Generate multi-objective feedback
feedback = generate_pareto_feedback(score_vector, theme_config)
return ScoreWithFeedback(
score=pareto_score,
feedback=feedback,
# Additional metadata for Pareto frontier tracking
metadata={'score_vector': score_vector}
)
except Exception as e:
return ScoreWithFeedback(
score=0.0,
feedback=f"Strategy failed: {str(e)}"
)
return metric
Proper backtesting with realistic evaluation to avoid common pitfalls:
def backtest_with_validation(strategy, market_data, validation_method='walk_forward',
purge_embargo=True, realistic_costs=True):
"""
Advanced backtesting with proper validation to avoid overfitting and bias.
Implements recommendations from recent finance ML reviews.
"""
if validation_method == 'walk_forward':
# Walk-Forward Analysis: Train on past, test on future, roll forward
results = walk_forward_analysis(
strategy=strategy,
data=market_data,
train_periods=252, # 1 year training
test_periods=63, # 3 months testing
step_size=21, # Reoptimize monthly
anchored=False # Expanding vs rolling window
)
elif validation_method == 'purged_cv':
# Purged Cross-Validation with embargo
results = purged_cross_validation(
strategy=strategy,
data=market_data,
n_splits=5,
purge_gap=10, # Gap between train/test to avoid leakage
embargo_pct=0.02 # 2% embargo after test set
)
elif validation_method == 'combinatorial':
# Combinatorial Purged CV (López de Prado method)
results = combinatorial_cv(
strategy=strategy,
data=market_data,
n_splits=6,
n_test_splits=2
)
# Apply realistic trading costs
if realistic_costs:
results = apply_realistic_costs(
results,
commission=0.001, # 10 bps
slippage=0.0005, # 5 bps
market_impact=lambda size: 0.0001 * np.sqrt(size), # Square-root impact
funding_rate=0.0001 # For leveraged positions
)
# Shadow evaluation: Compare with live baseline
results['shadow_performance'] = shadow_evaluation(
strategy_results=results,
baseline='buy_and_hold',
confidence_level=0.95
)
return results
def walk_forward_analysis(strategy, data, train_periods, test_periods,
step_size, anchored=False):
"""
Walk-forward optimization to avoid look-ahead bias.
Each window is optimized on training data and tested on unseen future data.
"""
results = []
for i in range(0, len(data) - train_periods - test_periods, step_size):
# Training window
if anchored:
train_start = 0 # Expanding window
else:
train_start = i # Rolling window
train_end = i + train_periods
# Test window (future unseen data)
test_start = train_end
test_end = test_start + test_periods
# Optimize on training data
optimized_params = optimize_strategy(
strategy,
data[train_start:train_end]
)
# Test on future data (no look-ahead)
test_results = backtest_strategy(
strategy,
data[test_start:test_end],
params=optimized_params
)
results.append(test_results)
# Aggregate out-of-sample results
return aggregate_walk_forward_results(results)
def purged_cross_validation(strategy, data, n_splits, purge_gap, embargo_pct):
"""
Purged CV to prevent leakage from serial correlation.
Includes embargo to prevent test set information leaking into nearby training data.
"""
from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=n_splits, gap=purge_gap)
results = []
for train_idx, test_idx in tscv.split(data):
# Apply embargo: remove observations after test set
embargo_size = int(len(data) * embargo_pct)
train_idx = train_idx[~np.isin(train_idx,
range(max(test_idx) + 1,
min(max(test_idx) + embargo_size + 1, len(data))))]
# Ensure no overlap with purge
train_idx = train_idx[train_idx < min(test_idx) - purge_gap]
# Backtest on purged data
train_data = data.iloc[train_idx]
test_data = data.iloc[test_idx]
test_results = backtest_strategy(strategy, test_data,
train_data=train_data)
results.append(test_results)
return aggregate_cv_results(results)
def shadow_evaluation(strategy_results, baseline='buy_and_hold', confidence_level=0.95):
"""
Shadow/parallel evaluation comparing strategy to simple baselines.
Helps detect overfitting and provides reality check.
"""
shadow_metrics = {
'outperformance': strategy_results['total_return'] - baseline_return,
'information_ratio': (strategy_results['mean_return'] - baseline_mean) / tracking_error,
'hit_rate': np.mean(strategy_returns > baseline_returns),
'max_underperformance': np.min(strategy_returns - baseline_returns),
'confidence_interval': bootstrap_confidence_interval(
strategy_returns - baseline_returns,
confidence_level
)
}
# Statistical significance tests
shadow_metrics['t_statistic'], shadow_metrics['p_value'] = stats.ttest_rel(
strategy_returns,
baseline_returns
)
return shadow_metrics
GEPA can be configured to maintain a Pareto frontier instead of collapsing to a single score:
def run_gepa_with_pareto(theme='momentum'):
"""
Run GEPA with multi-objective Pareto optimization.
Maintains a frontier of non-dominated solutions.
"""
# Initialize program
program = TradingStrategyModule()
# Create Pareto-aware metric
metric = create_pareto_metric(theme)
# Configure GEPA for Pareto optimization
gepa = dspy.GEPA(
metric=metric,
max_iterations=25,
verbose=True,
# Pareto-specific settings
candidate_selection_strategy="pareto", # Use Pareto dominance
pareto_objectives=['sharpe', 'win_rate', 'drawdown', 'profit_factor'],
maintain_frontier_size=20, # Keep top 20 non-dominated solutions
diversity_coefficient=0.8, # Higher diversity for frontier exploration
)
# Run optimization
print(f"Starting Pareto GEPA optimization for {theme} strategies...")
optimized_program = gepa.compile(
student=program,
trainset=train_examples,
max_eval_calls=200
)
# Save Pareto frontier
save_path = Path(f"data/gepa_logs/pareto/frontier_{theme}.pkl")
save_path.parent.mkdir(parents=True, exist_ok=True)
with open(save_path, 'wb') as f:
pickle.dump({
'pareto_frontier': gepa.pareto_frontier,
'dominated_solutions': gepa.dominated_solutions,
'objective_values': gepa.objective_values
}, f)
print(f"Pareto frontier saved with {len(gepa.pareto_frontier)} non-dominated solutions")
# Visualize frontier
plot_pareto_frontier(gepa.pareto_frontier)
return optimized_program
def plot_pareto_frontier(frontier):
"""
Visualize the Pareto frontier in objective space.
Shows trade-offs between different objectives.
"""
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Extract objectives
sharpe_values = [s['sharpe'] for s in frontier]
winrate_values = [s['win_rate'] for s in frontier]
drawdown_values = [s['drawdown'] for s in frontier]
# 3D Pareto frontier plot
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(sharpe_values, winrate_values, drawdown_values,
c='blue', marker='o', s=100, alpha=0.6)
ax.set_xlabel('Sharpe Ratio')
ax.set_ylabel('Win Rate')
ax.set_zlabel('1 - Max Drawdown')
ax.set_title('Pareto Frontier of Trading Strategies')
plt.show()
For comparison, here's the original single-scalar metric approach:
def create_trading_metric(theme='momentum'):
def metric(gold: Example, pred, trace=None, pred_name=None, pred_trace=None):
try:
# Parse and validate strategy
strategy = pred.strategy
# Extract strategy parameters
rsi_period = extract_rsi_period(strategy) # e.g., 14
ema_short = extract_ema_short(strategy) # e.g., 9
ema_long = extract_ema_long(strategy) # e.g., 21
# Run backtest
results = backtest_strategy(
btc_data,
rsi_period=rsi_period,
ema_short=ema_short,
ema_long=ema_long
)
# Calculate comprehensive metrics
sharpe = results.get('sharpe_ratio', 0)
win_rate = results.get('win_rate', 0)
max_dd = results.get('max_drawdown', 0)
profit_factor = results.get('profit_factor', 0)
# Composite score (0 to 1) - theme-aware
score = (
0.4 * min(sharpe / target_sharpe, 1.0) + # Theme-specific Sharpe target
0.3 * min(win_rate / target_win_rate, 1.0) + # Theme-specific win rate
0.2 * (1 - abs(max_dd)) + # Drawdown penalty
0.1 * min(profit_factor / 2.0, 1.0) # Target PF > 2
)
# Generate theme-specific actionable feedback
feedback = []
if sharpe < target_sharpe * 0.5:
feedback.append(f"Sharpe ratio too low for {theme} strategy. Target is {target_sharpe}. Adjust {expected_indicators[0]} parameters.")
elif sharpe > target_sharpe * 1.2:
feedback.append(f"Excellent Sharpe ratio for {theme}! Maintain current approach.")
if win_rate < target_win_rate * 0.8:
feedback.append(f"Win rate below target {target_win_rate:.0%} for {theme}. Adjust entry conditions.")
elif win_rate > target_win_rate * 1.2:
feedback.append(f"Strong win rate for {theme}. Ensure you're not over-fitting.")
if abs(max_dd) > 0.15:
feedback.append(f"Drawdown of {max_dd:.1%} is too high. Implement position sizing rules.")
if profit_factor < 1.2:
feedback.append("Profit factor needs improvement. Optimize your risk/reward ratios.")
# Add theme-specific indicator feedback
for indicator in expected_indicators[:2]:
feedback.append(f"Ensure {indicator} is properly configured for {theme} strategy.")
# Combine feedback
feedback_str = " ".join(feedback) if feedback else f"{theme.capitalize()} strategy performing well!"
return ScoreWithFeedback(
score=min(1.0, max(0.0, score)),
feedback=feedback_str
)
except Exception as e:
return ScoreWithFeedback(
score=0.0,
feedback=f"Strategy generation failed: {str(e)}. Ensure proper format for {theme} strategy."
)
return metric # Return the metric function
The optimization evolves theme-specific prompts through iterative reflection:
def run_gepa_optimization(theme='momentum'):
"""Run GEPA to evolve trading strategy prompts"""
# Initialize program
program = TradingStrategyModule()
# Create training examples
train_examples = [
dspy.Example(
market_context="High volatility BTC market with 1-minute data",
strategy="[Gold standard strategy would go here]"
).with_inputs("market_context")
for _ in range(10) # Create multiple examples
]
# Create theme-specific metric
metric = create_trading_metric(theme)
# Configure GEPA
gepa = dspy.GEPA(
metric=metric,
max_iterations=25,
verbose=True,
recall_k=3,
diversity_coefficient=0.7
)
# Run optimization
print(f"Starting GEPA optimization for {theme} strategies...")
optimized_program = gepa.compile(
student=program,
trainset=train_examples,
max_eval_calls=200
)
# Save evolved state with theme-specific filename
save_path = Path(f"data/gepa_logs/enhanced/gepa_state_{theme}.bin")
save_path.parent.mkdir(parents=True, exist_ok=True)
with open(save_path, 'wb') as f:
pickle.dump({
'program_candidates': gepa.program_candidates,
'validation_scores': gepa.validation_scores
}, f)
print(f"Optimization complete! Evolved prompt from {len(gepa.program_candidates[0])} to {len(gepa.program_candidates[-1])} characters")
return optimized_program
After GEPA optimization, the evolved prompts are used in production:
class GEPATradingSystem(dspy.Module):
"""Production trading system using GEPA-optimized prompts"""
def __init__(self, theme='momentum'):
self.theme = theme
self.load_evolved_prompt()
self.strategy_generator = dspy.ChainOfThought(TradingStrategySignature)
def load_evolved_prompt(self):
"""Load the best evolved prompt from GEPA optimization"""
state_file = Path(f"data/gepa_logs/enhanced/gepa_state_{self.theme}.bin")
with open(state_file, 'rb') as f:
state = pickle.load(f)
# Get the best performing prompt
candidates = state['program_candidates']
if len(candidates) > 1:
# Extract evolved prompt from best candidate
evolved_prompt = candidates[-1]['generate_strategy.predict']
# Apply to current module
self.apply_prompt(evolved_prompt)
print(f"Loaded evolved prompt ({len(evolved_prompt)} chars)")
def generate_signals(self, market_data):
"""Generate trading signals using evolved strategy"""
# Use evolved prompt to generate strategy
context = self.analyze_market(market_data)
strategy = self.strategy_generator(market_context=context)
# Convert strategy to signals
signals = self.strategy_to_signals(strategy.strategy, market_data)
return signals
def run_backtest(self, market_data):
"""Full backtest with risk management"""
signals = self.generate_signals(market_data)
# Run through backtesting engine
results = backtest_with_vectorbt(
data=market_data,
signals=signals,
commission=0.001,
slippage=0.001
)
return results
After 25 GEPA iterations, our system achieved:
# Final metrics from optimized strategy
results = {
"sharpe_ratio": 2.013,
"win_rate": 0.557,
"max_drawdown": -0.089,
"profit_factor": 1.64,
"total_trades": 342,
"annual_return": 0.487
}
print("=== GEPA-Optimized Strategy Performance ===")
for metric, value in results.items():
if "rate" in metric or "return" in metric:
print(f"{metric}: {value:.1%}")
else:
print(f"{metric}: {value:.3f}")
GEPA evolved our initial prompt from 66 characters:
"Generate a profitable RSI-EMA momentum trading strategy"
To 2,576 characters with detailed reasoning:
"Generate a profitable RSI-EMA momentum trading strategy.
CRITICAL REQUIREMENTS for high Sharpe ratio (>2.0):
1. Risk Management is PARAMOUNT:
- Position size: Maximum 2% risk per trade
- Stop loss: ATR-based, typically 1.5x ATR
- Take profit: Minimum 2:1 risk/reward ratio
2. Entry Conditions (ALL must be met):
- RSI between 30-70 (avoid extremes for false signals)
- Price above short EMA (9) for longs, below for shorts
- Short EMA crossing long EMA (21) in direction of trade
- Volume confirmation: Current > 1.2x average volume
3. Exit Conditions:
- Stop loss hit (preserve capital above all)
- Target reached (let winners run with trailing stop)
- RSI divergence (momentum weakening)
- EMA cross against position
4. Optimization Guidelines:
- RSI period: 14-21 (shorter = more signals, more false positives)
- EMA periods: 9/21 or 12/26 (classic combinations)
- Avoid overtrading: Maximum 10 trades per day
- Time filters: Avoid major news events +/- 30 minutes
Based on backtesting feedback:
- Sharpe < 1.5: Tighten stops, reduce position size
- Win rate < 45%: More selective entries, confirm with volume
- Drawdown > 15%: Implement maximum daily loss limit
- Profit factor < 1.2: Improve risk/reward targeting
Remember: Consistency beats home runs. Focus on risk-adjusted returns."
The system supports the --theme
flag for specialized strategy optimization:
# Optimize for momentum strategies (default)
python test_gepa_enhanced.py --theme momentum --iterations 25
# Optimize for mean reversion strategies
python test_gepa_enhanced.py --theme mean_reversion --iterations 25
# Optimize for breakout strategies with verbose output
python test_gepa_enhanced.py --theme breakout --iterations 30 --verbose
# Optimize for arbitrage strategies
python test_gepa_enhanced.py --theme arbitrage --iterations 25
# Optimize for volume-based strategies
python test_gepa_enhanced.py --theme volume --iterations 20
After optimization, run the trading system with the corresponding theme:
# Run trading with momentum theme
python run_gepa_trading.py --theme momentum
# Run trading with breakout theme and more attempts
python run_gepa_trading.py --theme breakout --max-attempts 15
# Run trading with arbitrage theme in verbose mode
python run_gepa_trading.py --theme arbitrage --verbose
Each theme optimizes for different targets:
Theme | Target Sharpe | Target Win Rate | Key Indicators |
---|---|---|---|
Momentum | 2.0 | 55% | RSI, EMA, MACD |
Mean Reversion | 1.8 | 60% | Bollinger Bands, RSI, Stochastic |
Breakout | 2.2 | 45% | ATR, Volume, Donchian Channels |
Arbitrage | 3.0 | 65% | Correlation, Z-Score, Cointegration |
Volume | 1.9 | 52% | VWAP, OBV, Volume Profile |
- Theme Specialization: Different strategy types require different optimization targets and indicators
- Iterative Refinement: GEPA discovered that explicit risk management rules dramatically improve Sharpe ratio
- Actionable Feedback: Theme-specific, measurable feedback in the metric function guides evolution effectively
- Prompt Complexity: Evolved prompts are significantly longer but produce more consistent strategies
- Separation of Concerns: Offline optimization (GEPA) remains completely separate from live execution
- Kagen's Vision Realized: The
--theme
flag enables specialized optimization exactly as envisioned
This tutorial demonstrated how GEPA can optimize prompts for quantitative trading strategies with theme specialization, achieving:
- 66 → 2,576 character prompt evolution through iterative optimization
- Theme-specific targets: Different Sharpe and win rate goals for each strategy type
- Complete separation between research and execution as Kagen envisioned
- Actionable feedback loop that guides prompt improvement based on theme
--theme
flag enabling specialization for momentum, mean reversion, breakout, arbitrage, and volume strategies
The system realizes the vision of autonomous LLM-powered trading research, where:
- GEPA continuously evolves better prompts offline
- Production systems use evolved prompts deterministically
- No LLM calls in the live trading loop
- Human traders review and approve strategy updates
- Multi-Asset Strategies: Extend to forex, commodities, and equities
- Alternative Indicators: Incorporate MACD, Bollinger Bands, Volume Profile
- Market Regime Detection: Adapt strategies to trending vs ranging markets
- Portfolio Optimization: Evolve prompts for multi-strategy allocation
- Walk-Forward Analysis: Continuous re-optimization on rolling windows
The complete implementation is available in two core files:
test_gepa_enhanced.py
: GEPA optimization looprun_gepa_trading.py
: Production trading system
These demonstrate the power of GEPA for evolving sophisticated trading strategies through reflective prompt optimization.