Author: Albert Bentov Date: 2026-02-11 Status: Design Proposal
Date: Sun, Feb 15th 2026
Before implementing ML-based routing, start with a spreadsheet-based interface per campaign where users input economics and receive deterministic model + token recommendations.
User Inputs (per campaign):
| Parameter | Description | Example Value |
|---|---|---|
ecpm |
Actual revenue per 1000 impressions ($) | $3.50 |
target_roi |
Required return on investment (%) | 150% (1.5x) |
campaign_type |
brand_awareness or performance |
performance |
max_cost_per_gen |
Budget constraint per generation ($) | $0.010 |
expected_ctr |
Target click-through rate (%) | 0.4% |
domain_tier |
premium, standard, or low |
standard |
monthly_impressions |
Forecasted impression volume | 800,000 |
System Recommendations (output):
| Output | Description | Example Value |
|---|---|---|
model_tier |
premium, balanced, or budget |
budget |
token_setting |
Input context size | low (title + para1) |
llm_model |
Specific model to use | gemini-2-flash |
image_model |
Image generation model | flux-schnell |
cost_per_generation |
Expected generation cost | $0.0086 |
breakeven_ecpm |
Minimum eCPM for profitability at target ROI | $1.29 |
monthly_cost |
Projected monthly spend | $430 |
monthly_profit |
Projected profit (eCPM × volume - cost) | $-255 |
recommendation |
Profitability assessment | ✅ Profitable with budget tier |
| Setting | Article Input | Input Tokens | Use Case |
|---|---|---|---|
| high | Full article (title + body) | ~2500 tokens | Premium campaigns, complex content |
| medium | Title + first 2-3 paragraphs | ~800 tokens | Balanced quality/cost |
| low | Title + first paragraph | ~400 tokens | Budget campaigns, simple content |
def recommend_model_config(
ecpm: float,
target_roi: float,
max_cost_per_gen: float,
campaign_type: str,
domain_tier: str
) -> dict:
"""
Deterministic model + token recommendation based on campaign economics.
Returns config with profitability check and provider tier bins.
"""
# Calculate max allowable cost per generation
max_cost = (ecpm / 1000) / target_roi
max_cost = min(max_cost, max_cost_per_gen) # Respect budget constraint
# Decision tree with provider tier bins
if max_cost >= 0.040 and (domain_tier == 'premium' or campaign_type == 'brand_awareness'):
return {
'model_tier': 'premium',
'token_setting': 'high',
'llm_providers': 'OpenAI GPT (flagship), Anthropic Claude (flagship), Google Gemini (pro tier)',
'image_providers': 'Google Imagen, Stability AI (premium), Midjourney',
'cost_range': '$0.040 - $0.050',
'cost_per_generation': 0.0443,
'profitable': max_cost >= 0.0443
}
elif max_cost >= 0.010:
return {
'model_tier': 'balanced',
'token_setting': 'medium',
'llm_providers': 'Google Gemini (flash tier), Anthropic Claude (fast tier), Together AI',
'image_providers': 'Replicate FLUX (dev tier), Stability AI (standard)',
'cost_range': '$0.010 - $0.015',
'cost_per_generation': 0.0117,
'profitable': max_cost >= 0.0117
}
elif max_cost >= 0.003:
return {
'model_tier': 'budget',
'token_setting': 'low',
'llm_providers': 'Google Gemini (flash-lite), Fireworks AI, Anyscale',
'image_providers': 'Replicate FLUX (fast tier), Leonardo AI (fast)',
'cost_range': '$0.003 - $0.010',
'cost_per_generation': 0.0086,
'profitable': max_cost >= 0.0086
}
else:
return {
'model_tier': 'ultra-budget',
'token_setting': 'low',
'llm_providers': 'Groq (Llama/Mixtral), Together AI (open models), Hugging Face Inference',
'image_providers': 'Replicate FLUX (schnell), Stability AI (turbo)',
'cost_range': '$0.001 - $0.003',
'cost_per_generation': 0.0032,
'profitable': max_cost >= 0.0032
}Text Generation:
- OpenAI - GPT models (premium tier, high quality)
- Anthropic - Claude models (premium tier, reasoning-focused)
- Google - Gemini models (pro/flash tiers, good balance)
- Groq - Open models (Llama, Mixtral) with ultra-fast inference
- Together AI - Open source models, competitive pricing
- Fireworks AI - Fast inference, budget-friendly
- Anyscale - Llama/Mistral hosting, good for scale
- Replicate - Various open models
- Hugging Face - Inference API for open models
Image Generation:
- Google - Imagen models (premium quality)
- Stability AI - Stable Diffusion variants (multiple tiers)
- Replicate - FLUX, Stable Diffusion, various models
- Midjourney - High quality (limited API access)
- Leonardo AI - Fast generation, multiple style presets
- RunwayML - Creative AI tools, image generation
- OpenAI - DALL-E models (premium tier)
To populate the interface and validate economics, collect these metrics:
From generations table (or equivalent):
-- Cost per generation by campaign
SELECT campaign_id,
COUNT(*) as generation_count,
AVG(model_cost) as avg_cost_per_gen, -- if tracked
AVG(token_count_input) as avg_input_tokens,
AVG(token_count_output) as avg_output_tokens
FROM generations
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY campaign_id;From impressions table:
-- Impressions and CTR by campaign/domain
SELECT campaign_id,
domain,
COUNT(*) as impression_count,
SUM(CASE WHEN clicked = 1 THEN 1 ELSE 0 END) as click_count,
AVG(CASE WHEN clicked = 1 THEN 1.0 ELSE 0.0 END) as ctr,
AVG(ecpm) as avg_ecpm -- if available
FROM impressions
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY campaign_id, domain;From RTB win data (if available):
-- Actual revenue per impression
SELECT campaign_id,
AVG(win_price) as avg_win_price,
COUNT(*) as win_count,
AVG(bid_price) as avg_bid_price
FROM rtb_wins
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY campaign_id;From controlled_ads table:
-- Human approval rate (quality proxy)
SELECT campaign_id,
COUNT(*) as total_generated,
SUM(CASE WHEN type = 2 THEN 1 ELSE 0 END) as approved_count,
AVG(CASE WHEN type = 2 THEN 1.0 ELSE 0.0 END) as approval_rate
FROM controlled_ads
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY campaign_id;-
Spreadsheet template (Google Sheets/Excel) with formulas:
- Input cells: eCPM, ROI target, campaign type, etc.
- Calculation cells: max allowable cost, breakeven eCPM
- Recommendation cells: model tier, token setting, profitability check
-
Simple Python script (
recommend_config.py):- Read campaign params from CSV/JSON
- Apply decision logic
- Output recommended config + economics forecast
- Flag unprofitable campaigns
-
Dashboard integration (future):
- Per-campaign config UI
- Real-time cost/profit tracking
- One-click apply to production
Campaign A: Low-tier performance campaign
Inputs:
- eCPM: $0.80 (actual paid)
- Target ROI: 150% (1.5x)
- Max cost per gen: $0.010
- Campaign type: performance
- Expected CTR: 0.3%
- Domain tier: low
Calculations:
- Max allowable cost = ($0.80 / 1000) / 1.5 = $0.00053
- Budget constraint: $0.010
Recommendation:
- Model tier: budget (cost $0.0086)
- Token setting: low (400 tokens)
- Models: gemini-2-flash + flux-schnell
- Breakeven eCPM: $1.29 (at 1.5x ROI)
- Assessment:
⚠️ Not profitable (eCPM too low, need $1.29+) - Suggestion: Reject campaign or negotiate higher eCPM
Campaign B: Mid-tier performance campaign
Inputs:
- eCPM: $4.50 (actual paid)
- Target ROI: 150% (1.5x)
- Max cost per gen: $0.015
- Campaign type: performance
- Expected CTR: 0.4%
- Domain tier: standard
Calculations:
- Max allowable cost = ($4.50 / 1000) / 1.5 = $0.0030
Recommendation:
- Model tier: budget (cost $0.0086)
- Token setting: low (400 tokens)
- Models: gemini-2-flash + flux-schnell
- Breakeven eCPM: $1.29 (at 1.5x ROI)
- Assessment: ✅ Profitable (eCPM $4.50 > breakeven $1.29)
- Profit per impression: $0.0045 - $0.0086 = -$0.0041
- Wait, still unprofitable! Need to reduce cost further or increase eCPM
Campaign C: Premium brand campaign
Inputs:
- eCPM: $6.50 (actual paid)
- Target ROI: 120% (1.2x)
- Max cost per gen: $0.020
- Campaign type: brand_awareness
- Expected CTR: 0.5% (high quality)
- Domain tier: premium
Calculations:
- Max allowable cost = ($6.50 / 1000) / 1.2 = $0.0054
Recommendation:
- Model tier: budget (cost $0.0086)
- Token setting: medium (800 tokens, cost $0.0095)
- Models: gemini-2-flash + flux-dev
- Assessment:
⚠️ Marginally unprofitable (cost $0.0095 > allowable $0.0054) - Alternative: Use ultra-budget Groq + flux-schnell (cost $0.0032) → ✅ Profitable
- Profit per impression: $0.0065 - $0.0032 = $0.0033 (✅ 203% ROI)
This document proposes a two-tiered approach for intelligent model selection in contextual ad generation:
- Simple Approach (immediate): Rule-based routing for current production based on expected click value (eCPM) and domain reputation
- Advanced Approach (post-online learning): Learned routing integrated with quality predictor (Ĉ) and performance predictor (P̂)
Expected Impact:
- 50-70% cost reduction on low-value impressions
- Maintained quality on high-value impressions
- 2-5x faster response times for most requests
- Profitability threshold enforcement per impression
- Problem Statement
- API Cost Analysis (2026)
- Simple Approach: Rule-Based Routing
- Advanced Approach: Learned Routing
- Fast Wins: Input Token Optimization
- ROI Analysis & Break-Even Scenarios
- Implementation Roadmap
- Risk Mitigation
Our production system (ControlledAd.py) serves contextual ads with human-in-the-loop approval:
- Fetch article (title + body)
- Generate embeddings (256d)
- Find anchor ad via similarity search
- Exploration trigger: When predefined categories or approved candidates fail similarity threshold
- Brand safety check (LLM call on title + content)
- Generate candidate variants (LLM with mega-prompt: brand + styling + strategies + few-shot + safety instructions)
- Generate image
- Human approval → serve winning ad
Problem: We use expensive models uniformly regardless of:
- Expected click value (advertiser's willingness to pay)
- Domain quality/reputation (premium publishers vs low-traffic blogs)
- Content complexity (simple product ads vs nuanced brand campaigns)
Result: Unprofitable on low-eCPM impressions, over-engineered for simple contexts.
Dynamically select model tier (LLM size, image generation quality) based on:
Constraint: Maintain quality standards while maximizing profit margin per impression.
| Model | Provider | Input Cost ($/1M tokens) | Output Cost ($/1M tokens) | Context | Speed | Use Case |
|---|---|---|---|---|---|---|
| GPT-5.2 | OpenAI | $1.75 | $14.00 | 400K | Fast | Premium tier |
| Gemini 3 Pro | $2.00 | $12.00 | 200K | Fast | Premium tier | |
| Gemini 3 Flash | $0.50 | $3.00 | 1M | Very Fast | Balanced tier | |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | Very Fast | Budget tier | |
| Llama 3.1 8B | Groq | $0.05 | $0.08 | 128K | Ultra Fast | Ultra-budget |
| Mixtral 8x7B | Groq | $0.27 | $0.27 | 32K | Very Fast | Budget alternative |
| Claude Haiku | Anthropic | $1.00 | $5.00 | 200K | Fast | Budget fallback |
Key Observations:
- Gemini 2.0 Flash is 20x cheaper than Gemini 3 Pro on input, 30x cheaper than GPT-5.2
- Groq inference is 35-40x cheaper than premium models with acceptable quality trade-offs
- Context caching available on Gemini (75% savings on repeated prompts, cache reads at 10% of input price)
- GPT-5.2 generates internal "thinking" tokens billed as output ($14/1M)
Sources:
| Model | Provider | Resolution | Cost per Image | Speed | Use Case |
|---|---|---|---|---|---|
| Imagen 3 | 1024×1024 | $0.030 | ~8s | Premium tier | |
| FLUX.1 [pro] | Replicate | 1024×1024 | $0.055 | ~10s | High quality |
| FLUX.1 [dev] | Replicate | 1024×1024 | $0.030 | ~6s | Balanced tier |
| FLUX.1 [schnell] | Replicate | 1024×1024 | $0.003 | ~2s | Budget tier |
Key Observations:
- Flux schnell is 10x cheaper than Imagen 3 with acceptable quality
- 2-3 second generation time enables real-time workflows
- Flux dev offers good balance (same price as Imagen 3, faster)
Sources:
Production mega-prompt structure:
- Brand description: ~200 tokens
- Styling instructions: ~300 tokens
- Strategy guidelines: ~200 tokens
- Few-shot examples (3-5 examples): ~500 tokens
- Safety instructions: ~150 tokens
- Article content (full): ~2500 tokens
- Total input: ~3,850 tokens
Brand safety call:
- System prompt: ~200 tokens
- Article (title + content): ~2500 tokens
- Total safety check: ~2,700 tokens
Scenario: Generate 1 contextual ad with exploration triggered
| Component | Tokens/Params | Model | Cost |
|---|---|---|---|
| Brand safety check | 2,700 input + 50 output | GPT-5.2 (current prod) | $0.00543 |
| Article embedding | 2,500 input | text-embedding-3-small | $0.00005 |
| Tagline generation | 3,850 input + 150 output | GPT-5.2 or Gemini 3 Pro | $0.00884 |
| Image generation | 1 image | Imagen 3 | $0.03000 |
| Total (Premium) | $0.0443 |
Alternative (Budget):
| Component | Tokens/Params | Model | Cost |
|---|---|---|---|
| Brand safety check | 2,700 input + 50 output | GPT-5.2 (unchanged) | $0.00543 |
| Article embedding | 800 input (title + para1) | text-embedding-3-small | $0.00002 |
| Tagline generation (compact) | 1,200 input + 150 output | Gemini 2.0 Flash | $0.00018 |
| Image generation | 1 image | Flux schnell | $0.00300 |
| Total (Budget) | $0.0086 |
Savings: 91% cost reduction per generation
Ultra-budget (Groq):
| Component | Tokens/Params | Model | Cost |
|---|---|---|---|
| Brand safety check | 2,700 input + 50 output | Llama 3.1 8B (Groq) | $0.00014 |
| Tagline generation (compact) | 1,200 input + 150 output | Llama 3.1 8B (Groq) | $0.00007 |
| Image generation | 1 image | Flux schnell | $0.00300 |
| Total (Ultra-budget) | $0.0032 |
Savings: 92% cost reduction, 5x faster
def select_model_tier(
ecpm: float, # Expected CPM ($/1000 impressions)
domain_quality: str, # 'premium' | 'standard' | 'low'
content_length: int, # Article word count
campaign_type: str # 'brand_awareness' | 'performance'
) -> dict:
"""
Simple rule-based model selection.
Returns:
{
'llm': str,
'llm_tier': 'premium' | 'balanced' | 'budget',
'image': str,
'image_tier': 'premium' | 'balanced' | 'budget',
'input_mode': 'full_article' | 'title_plus_para1',
'max_cost': float
}
"""
# Profitability threshold (must cover at least 2x generation cost)
MIN_ECPM_PREMIUM = 10.0 # $10 eCPM = $0.010 per impression
MIN_ECPM_BALANCED = 3.0 # $3 eCPM = $0.003 per impression
# Decision tree
if ecpm >= MIN_ECPM_PREMIUM and domain_quality == 'premium':
# High-value, premium publishers → best quality
return {
'llm': 'gpt-5.2', # or 'gemini-3-pro'
'llm_tier': 'premium',
'image': 'imagen-3',
'image_tier': 'premium',
'input_mode': 'full_article',
'safety_model': 'gpt-5.2', # Current production
'max_cost': 0.0443
}
elif ecpm >= MIN_ECPM_BALANCED and domain_quality in ['premium', 'standard']:
# Mid-value, good publishers → balanced
return {
'llm': 'gemini-3-flash',
'llm_tier': 'balanced',
'image': 'flux-dev',
'image_tier': 'balanced',
'input_mode': 'title_plus_para1',
'safety_model': 'gpt-5.2', # Current production
'max_cost': 0.0117
}
elif campaign_type == 'brand_awareness':
# Brand campaigns → prioritize quality over cost
return {
'llm': 'gpt-5.2', # or 'gemini-3-pro'
'llm_tier': 'premium',
'image': 'flux-dev', # Balanced image sufficient
'image_tier': 'balanced',
'input_mode': 'title_plus_para1',
'safety_model': 'gpt-5.2', # Current production
'max_cost': 0.0159
}
else:
# Low-value or unproven domains → budget
return {
'llm': 'gemini-2-flash',
'llm_tier': 'budget',
'image': 'flux-schnell',
'image_tier': 'budget',
'input_mode': 'title_plus_para1',
'safety_model': 'gpt-5.2', # Current production
'max_cost': 0.0086
}Data Sources (existing in production):
- Impression count (from
impressionstable) - CTR history (clicks / impressions per domain)
- Human approval rate (from
controlled_adstype=2 vs type=-1) - Publisher whitelist/blacklist
Simple Heuristic:
def classify_domain_quality(domain: str) -> str:
"""Classify domain based on historical stats."""
stats = get_domain_stats(domain)
if domain in PREMIUM_WHITELIST:
return 'premium'
if stats['impression_count'] > 10000 and stats['ctr'] > 0.02:
return 'premium'
if stats['impression_count'] > 1000 and stats['ctr'] > 0.01:
return 'standard'
return 'low'Modification point before exploration trigger:
def _trigger_exploration_async(self, selected_ad: Dict | None) -> None:
"""Trigger exploration with dynamic model selection."""
# NEW: Select model tier before generation
model_config = select_model_tier(
ecpm=self.calculate_ecpm(),
domain_quality=self.classify_domain(),
content_length=len(self.article_text.split()),
campaign_type=self.campaign_type
)
# Store config for exploration method to use
self.model_config = model_config
# Existing exploration logic...
if self.cache.get_from_cache(self.key_lock_exploration):
return
self.cache.update_cache(
self.key_lock_exploration,
{'exploration_in_progress': 1},
EXPIRATION_60_SEC
)
if self.exploration_method:
async_call(self._execute_exploration_on_copy, selected_ad)Traffic Distribution (estimated):
| Tier | % Traffic | Avg eCPM | Current Cost | New Cost | Savings |
|---|---|---|---|---|---|
| Premium | 15% | $12.00 | $0.0443 | $0.0443 | $0 |
| Balanced | 35% | $5.00 | $0.0443 | $0.0117 | $0.0326 |
| Budget | 50% | $1.50 | $0.0443 | $0.0086 | $0.0357 |
Total Savings: (0.35 × $0.0326) + (0.50 × $0.0357) = $0.0293 per impression (66% reduction)
Annual Impact (1M impressions/month):
- Current: $44,300/month
- New: $15,055/month
- Savings: $29,245/month ($350,940/year)
Once the self-learning framework (Ĉ, P̂, DSPy) is operational, upgrade routing to use learned signals:
def select_model_tier_learned(
context: dict, # Brand, article, domain
C_hat_threshold: float = 0.7, # Quality predictor threshold
P_hat_threshold: float = 0.02, # Performance predictor threshold
ecpm: float = None
) -> dict:
"""
Learned model selection using quality and performance predictors.
Key insight: If we predict high approval (Ĉ) and high CTR (P̂),
it's worth investing in premium models. Otherwise, use budget.
"""
# Quick quality pre-check using Ĉ on anchor ad
anchor_quality = C_hat(context['brand'], context['article'], context['anchor'])
# Predicted performance using P̂ on anchor
predicted_ctr = P_hat(context['article'], context['anchor'])
# Calculate expected value of premium vs budget generation
premium_value = (
predicted_ctr * 1.2 * # Assume 20% CTR lift from premium models
ecpm / 1000 - # Revenue per impression
0.0411 # Premium cost
)
budget_value = (
predicted_ctr * # No CTR lift assumption
ecpm / 1000 - # Revenue per impression
0.0035 # Budget cost
)
# Decision: use premium only if EV is higher
if premium_value > budget_value and anchor_quality > C_hat_threshold:
return {
'llm': 'gpt-5.2', # or 'gemini-3-pro'
'llm_tier': 'premium',
'image': 'imagen-3',
'image_tier': 'premium',
'input_mode': 'full_article',
'expected_value': premium_value,
'reason': f'High quality ({anchor_quality:.2f}) + high CTR ({predicted_ctr:.3f})'
}
else:
return {
'llm': 'gemini-2-flash',
'llm_tier': 'budget',
'image': 'flux-schnell',
'image_tier': 'budget',
'input_mode': 'title_plus_para1',
'expected_value': budget_value,
'reason': f'Budget sufficient (quality={anchor_quality:.2f}, CTR={predicted_ctr:.3f})'
}Treat model tier selection as a contextual bandit problem:
Context: (brand_id, domain_tier, content_category, article_length) Actions: (premium, balanced, budget) Reward: (revenue - cost) per impression
class ModelTierBandit:
"""Contextual bandit for model tier selection."""
def __init__(self):
self.policy = EpsilonGreedy(epsilon=0.1)
self.context_encoder = embed_context
self.Q_table = defaultdict(lambda: {'premium': 0.0, 'balanced': 0.0, 'budget': 0.0})
def select_tier(self, context: dict) -> str:
"""Select model tier using ε-greedy policy."""
context_key = self.context_encoder(context)
if random.random() < self.policy.epsilon:
return random.choice(['premium', 'balanced', 'budget'])
else:
return max(self.Q_table[context_key], key=self.Q_table[context_key].get)
def update(self, context: dict, tier: str, reward: float):
"""Update Q-value after observing reward."""
context_key = self.context_encoder(context)
alpha = 0.1 # Learning rate
old_Q = self.Q_table[context_key][tier]
self.Q_table[context_key][tier] = old_Q + alpha * (reward - old_Q)Typical production prompt:
- Mega-prompt components: ~1,350 tokens
- Brand description: 200
- Styling instructions: 300
- Strategy guidelines: 200
- Few-shot examples: 500
- Safety instructions: 150
- Article (full): ~2,500 tokens
- Total input: ~3,850 tokens
Brand safety call:
- Article (title + content): ~2,500 tokens
- Safety prompt: ~200 tokens
- Total: ~2,700 tokens
Reduced input:
- Mega-prompt components: ~1,350 tokens (same)
- Article (title + para1): ~400 tokens
- Total input: ~1,750 tokens
Brand safety call (unchanged):
- Still uses full article for safety: ~2,700 tokens
Savings: 54% input token reduction on generation (safety unchanged for quality)
| Model | Full Article Cost | Compact Cost | Savings |
|---|---|---|---|
| GPT-5.2 | $0.00884 | $0.00401 | $0.00483 (55%) |
| Gemini 3 Pro | $0.00950 | $0.00431 | $0.00519 (55%) |
| Gemini 2.0 Flash | $0.00039 | $0.00018 | $0.00021 (54%) |
def build_compact_prompt(self, context: dict) -> str:
"""Build prompt using only title + first paragraph."""
article_title = context['article']['title']
article_body = context['article']['body']
# Extract first paragraph (split by \n\n or first 150 words)
first_paragraph = self.extract_first_paragraph(article_body, max_words=150)
# Mega-prompt components (unchanged)
mega_prompt = self.build_mega_prompt_base(context['brand'])
prompt = f"""{mega_prompt}
Article title: {article_title}
Article excerpt: {first_paragraph}
Anchor tagline: {context['anchor']['tagline']}
Generate contextual tagline variant following brand guidelines above.
Tagline:
"""
return promptProfit per impression:
Or for CPC campaigns:
| Model Tier | Break-even eCPM (2× margin) | Break-even CPC (1% CTR) | |
|---|---|---|---|
| Premium (GPT-5.2 + Imagen) | $0.0443 | $88.60 | $4.43 |
| Balanced (Gemini 3 Flash + Flux) | $0.0117 | $23.40 | $1.17 |
| Budget (Gemini 2 Flash + Flux) | $0.0086 | $17.20 | $0.86 |
Interpretation:
- Premium tier requires $82+ eCPM to be profitable with 2× margin
- Budget tier profitable at $7 eCPM (achievable on most campaigns)
- Ultra-cheap models (Groq + Flux schnell) profitable at <$1 eCPM
Scenario 1: Mid-value campaign (eCPM = $5, CTR = 1.5%)
| Tier | Cost | Revenue | Profit | ROI |
|---|---|---|---|---|
| Premium | $0.0443 | $0.0050 | -$0.0393 | -88.7% |
| Balanced | $0.0117 | $0.0050 | -$0.0067 | -57.3% |
| Budget | $0.0086 | $0.0050 | -$0.0036 | -41.9% |
Conclusion: Only budget tier profitable for typical campaigns.
Scenario 2: Premium publisher (CPC = $8, CTR = 2.5%)
| Tier | Cost | Revenue (CTR × CPC) | Profit | ROI |
|---|---|---|---|---|
| Premium | $0.0443 | $0.20 | $0.1557 | +351.5% |
| Balanced | $0.0117 | $0.20 | $0.1883 | +1609.4% |
| Budget | $0.0086 | $0.20 × 0.95 | $0.1814 | +2109.3% |
Insight: Even with -5% quality penalty, budget tier delivers highest ROI. Premium justified only for brand-sensitive campaigns.
Deliverables:
select_model_tier()function with eCPM + domain quality routing- Domain quality classifier (premium/standard/low)
- Integration into
ControlledAd._trigger_exploration_async() - Logging: model_tier, generation_cost, decision_reason
Success criteria:
- 50% of traffic routed to budget tier
- No drop in approval rate
- Cost savings confirmed
Deliverables:
build_compact_prompt()using title + para1- A/B test framework (50/50 split)
- Quality monitoring dashboard
Success criteria:
- <5% approval rate drop
- <3% CTR drop
- 54% input token savings confirmed
Deliverables:
- Historical analysis: profit vs tier by campaign
- Per-campaign threshold learning
- Threshold update automation
Success criteria:
- 10% additional profit vs fixed thresholds
- Thresholds stable (not oscillating)
Deliverables:
select_model_tier_learned()using Ĉ and P̂- Expected value calculation framework
- Bandit policy for exploration
Prerequisites:
- Ĉ (quality predictor) trained and deployed
- P̂ (performance predictor) trained and deployed
- Propensity logging operational
Success criteria:
- 15% profit improvement vs rule-based
- Bandit policy converges
Risk: Budget models produce lower quality, reducing approval rate and CTR.
Mitigation:
- Start with conservative thresholds (only low-value traffic to budget)
- Monitor approval rate daily, alert if <80%
- Circuit breaker: auto-revert to premium if approval drops >10%
- Human review sample: 100 budget-generated ads for manual QA
Risk: eCPM thresholds miscalibrated, losing money on expensive generations.
Mitigation:
- Default to budget tier unless eCPM exceeds 2× generation cost
- Continuous profit analysis per tier
- Threshold adjustment automation
Risk: Primary model down or rate-limited, fallback needed.
Mitigation:
- Fallback chain: Gemini 2.0 Flash → Groq Llama → Claude Haiku
- Cache model availability status (Redis, 1min TTL)
- Alert if fallback rate >5%
| Model | Provider | Input | Output | Speed | Context |
|---|---|---|---|---|---|
| GPT-5.2 | OpenAI | $1.75 | $14.00 | Fast | 400K |
| Gemini 3 Pro | $2.00 | $12.00 | Fast | 200K | |
| Gemini 3 Flash | $0.50 | $3.00 | Very Fast | 1M | |
| Gemini 2.0 Flash | $0.10 | $0.40 | Very Fast | 1M | |
| Llama 3.1 8B | Groq | $0.05 | $0.08 | Ultra Fast | 128K |
| Mixtral 8x7B | Groq | $0.27 | $0.27 | Very Fast | 32K |
| Claude Haiku | Anthropic | $1.00 | $5.00 | Fast | 200K |
| Model | Provider | Resolution | Cost | Speed |
|---|---|---|---|---|
| Imagen 3 | 1024×1024 | $0.030 | ~8s | |
| FLUX.1 [pro] | Replicate | 1024×1024 | $0.055 | ~10s |
| FLUX.1 [dev] | Replicate | 1024×1024 | $0.030 | ~6s |
| FLUX.1 [schnell] | Replicate | 1024×1024 | $0.003 | ~2s |
- GPT-5.2 API Pricing
- GPT-5.2 Pricing Calculator
- Gemini API Pricing
- Gemini 3 Pricing Guide
- Groq Pricing
- Replicate Pricing
- Claude API Pricing
- AI Image Model Pricing
End of Document