Dynamic Model Selection for Contextual Ad Generation

Author: Albert Bentov Date: 2026-02-11 Status: Design Proposal

Simple Deterministic Interface for Campaign Economics

Date: Sun, Feb 15th 2026

Overview

Before implementing ML-based routing, start with a spreadsheet-based interface per campaign where users input economics and receive deterministic model + token recommendations.

Interface Design (Spreadsheet/UI)

User Inputs (per campaign):

Parameter	Description	Example Value
`ecpm`	Actual revenue per 1000 impressions ($)	$3.50
`target_roi`	Required return on investment (%)	150% (1.5x)
`campaign_type`	`brand_awareness` or `performance`	performance
`max_cost_per_gen`	Budget constraint per generation ($)	$0.010
`expected_ctr`	Target click-through rate (%)	0.4%
`domain_tier`	`premium`, `standard`, or `low`	standard
`monthly_impressions`	Forecasted impression volume	800,000

System Recommendations (output):

Output	Description	Example Value
`model_tier`	`premium`, `balanced`, or `budget`	budget
`token_setting`	Input context size	low (title + para1)
`llm_model`	Specific model to use	gemini-2-flash
`image_model`	Image generation model	flux-schnell
`cost_per_generation`	Expected generation cost	$0.0086
`breakeven_ecpm`	Minimum eCPM for profitability at target ROI	$1.29
`monthly_cost`	Projected monthly spend	$430
`monthly_profit`	Projected profit (eCPM × volume - cost)	$-255
`recommendation`	Profitability assessment	✅ Profitable with budget tier

Token Settings Definition

Setting	Article Input	Input Tokens	Use Case
high	Full article (title + body)	~2500 tokens	Premium campaigns, complex content
medium	Title + first 2-3 paragraphs	~800 tokens	Balanced quality/cost
low	Title + first paragraph	~400 tokens	Budget campaigns, simple content

Decision Logic (Deterministic Rules)

def recommend_model_config(
    ecpm: float,
    target_roi: float,
    max_cost_per_gen: float,
    campaign_type: str,
    domain_tier: str
) -> dict:
    """
    Deterministic model + token recommendation based on campaign economics.

    Returns config with profitability check and provider tier bins.
    """

    # Calculate max allowable cost per generation
    max_cost = (ecpm / 1000) / target_roi
    max_cost = min(max_cost, max_cost_per_gen)  # Respect budget constraint

    # Decision tree with provider tier bins
    if max_cost >= 0.040 and (domain_tier == 'premium' or campaign_type == 'brand_awareness'):
        return {
            'model_tier': 'premium',
            'token_setting': 'high',
            'llm_providers': 'OpenAI GPT (flagship), Anthropic Claude (flagship), Google Gemini (pro tier)',
            'image_providers': 'Google Imagen, Stability AI (premium), Midjourney',
            'cost_range': '$0.040 - $0.050',
            'cost_per_generation': 0.0443,
            'profitable': max_cost >= 0.0443
        }

    elif max_cost >= 0.010:
        return {
            'model_tier': 'balanced',
            'token_setting': 'medium',
            'llm_providers': 'Google Gemini (flash tier), Anthropic Claude (fast tier), Together AI',
            'image_providers': 'Replicate FLUX (dev tier), Stability AI (standard)',
            'cost_range': '$0.010 - $0.015',
            'cost_per_generation': 0.0117,
            'profitable': max_cost >= 0.0117
        }

    elif max_cost >= 0.003:
        return {
            'model_tier': 'budget',
            'token_setting': 'low',
            'llm_providers': 'Google Gemini (flash-lite), Fireworks AI, Anyscale',
            'image_providers': 'Replicate FLUX (fast tier), Leonardo AI (fast)',
            'cost_range': '$0.003 - $0.010',
            'cost_per_generation': 0.0086,
            'profitable': max_cost >= 0.0086
        }

    else:
        return {
            'model_tier': 'ultra-budget',
            'token_setting': 'low',
            'llm_providers': 'Groq (Llama/Mixtral), Together AI (open models), Hugging Face Inference',
            'image_providers': 'Replicate FLUX (schnell), Stability AI (turbo)',
            'cost_range': '$0.001 - $0.003',
            'cost_per_generation': 0.0032,
            'profitable': max_cost >= 0.0032
        }

Major Inference Providers (Reference)

Text Generation:

OpenAI - GPT models (premium tier, high quality)
Anthropic - Claude models (premium tier, reasoning-focused)
Google - Gemini models (pro/flash tiers, good balance)
Groq - Open models (Llama, Mixtral) with ultra-fast inference
Together AI - Open source models, competitive pricing
Fireworks AI - Fast inference, budget-friendly
Anyscale - Llama/Mistral hosting, good for scale
Replicate - Various open models
Hugging Face - Inference API for open models

Image Generation:

Google - Imagen models (premium quality)
Stability AI - Stable Diffusion variants (multiple tiers)
Replicate - FLUX, Stable Diffusion, various models
Midjourney - High quality (limited API access)
Leonardo AI - Fast generation, multiple style presets
RunwayML - Creative AI tools, image generation
OpenAI - DALL-E models (premium tier)

Required Stats from PostgreSQL

To populate the interface and validate economics, collect these metrics:

From generations table (or equivalent):

-- Cost per generation by campaign
SELECT campaign_id,
       COUNT(*) as generation_count,
       AVG(model_cost) as avg_cost_per_gen,  -- if tracked
       AVG(token_count_input) as avg_input_tokens,
       AVG(token_count_output) as avg_output_tokens
FROM generations
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY campaign_id;

From impressions table:

-- Impressions and CTR by campaign/domain
SELECT campaign_id,
       domain,
       COUNT(*) as impression_count,
       SUM(CASE WHEN clicked = 1 THEN 1 ELSE 0 END) as click_count,
       AVG(CASE WHEN clicked = 1 THEN 1.0 ELSE 0.0 END) as ctr,
       AVG(ecpm) as avg_ecpm  -- if available
FROM impressions
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY campaign_id, domain;

From RTB win data (if available):

-- Actual revenue per impression
SELECT campaign_id,
       AVG(win_price) as avg_win_price,
       COUNT(*) as win_count,
       AVG(bid_price) as avg_bid_price
FROM rtb_wins
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY campaign_id;

From controlled_ads table:

-- Human approval rate (quality proxy)
SELECT campaign_id,
       COUNT(*) as total_generated,
       SUM(CASE WHEN type = 2 THEN 1 ELSE 0 END) as approved_count,
       AVG(CASE WHEN type = 2 THEN 1.0 ELSE 0.0 END) as approval_rate
FROM controlled_ads
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY campaign_id;

Implementation

Spreadsheet template (Google Sheets/Excel) with formulas:
- Input cells: eCPM, ROI target, campaign type, etc.
- Calculation cells: max allowable cost, breakeven eCPM
- Recommendation cells: model tier, token setting, profitability check
Simple Python script (recommend_config.py):
- Read campaign params from CSV/JSON
- Apply decision logic
- Output recommended config + economics forecast
- Flag unprofitable campaigns
Dashboard integration (future):
- Per-campaign config UI
- Real-time cost/profit tracking
- One-click apply to production

Example Use Cases

Campaign A: Low-tier performance campaign

Inputs:

eCPM: $0.80 (actual paid)
Target ROI: 150% (1.5x)
Max cost per gen: $0.010
Campaign type: performance
Expected CTR: 0.3%
Domain tier: low

Calculations:

Max allowable cost = ($0.80 / 1000) / 1.5 = $0.00053
Budget constraint: $0.010

Recommendation:

Model tier: budget (cost $0.0086)
Token setting: low (400 tokens)
Models: gemini-2-flash + flux-schnell
Breakeven eCPM: $1.29 (at 1.5x ROI)
Assessment: ⚠️ Not profitable (eCPM too low, need $1.29+)
Suggestion: Reject campaign or negotiate higher eCPM

Campaign B: Mid-tier performance campaign

Inputs:

eCPM: $4.50 (actual paid)
Target ROI: 150% (1.5x)
Max cost per gen: $0.015
Campaign type: performance
Expected CTR: 0.4%
Domain tier: standard

Calculations:

Max allowable cost = ($4.50 / 1000) / 1.5 = $0.0030

Recommendation:

Model tier: budget (cost $0.0086)
Token setting: low (400 tokens)
Models: gemini-2-flash + flux-schnell
Breakeven eCPM: $1.29 (at 1.5x ROI)
Assessment: ✅ Profitable (eCPM $4.50 > breakeven $1.29)
Profit per impression: $0.0045 - $0.0086 = -$0.0041
Wait, still unprofitable! Need to reduce cost further or increase eCPM

Campaign C: Premium brand campaign

Inputs:

eCPM: $6.50 (actual paid)
Target ROI: 120% (1.2x)
Max cost per gen: $0.020
Campaign type: brand_awareness
Expected CTR: 0.5% (high quality)
Domain tier: premium

Calculations:

Max allowable cost = ($6.50 / 1000) / 1.2 = $0.0054

Recommendation:

Model tier: budget (cost $0.0086)
Token setting: medium (800 tokens, cost $0.0095)
Models: gemini-2-flash + flux-dev
Assessment: ⚠️ Marginally unprofitable (cost $0.0095 > allowable $0.0054)
Alternative: Use ultra-budget Groq + flux-schnell (cost $0.0032) → ✅ Profitable
Profit per impression: $0.0065 - $0.0032 = $0.0033 (✅ 203% ROI)

Summary

This document proposes a two-tiered approach for intelligent model selection in contextual ad generation:

Simple Approach (immediate): Rule-based routing for current production based on expected click value (eCPM) and domain reputation
Advanced Approach (post-online learning): Learned routing integrated with quality predictor (Ĉ) and performance predictor (P̂)

Expected Impact:

50-70% cost reduction on low-value impressions
Maintained quality on high-value impressions
2-5x faster response times for most requests
Profitability threshold enforcement per impression

Problem Statement
API Cost Analysis (2026)
Simple Approach: Rule-Based Routing
Advanced Approach: Learned Routing
Fast Wins: Input Token Optimization
ROI Analysis & Break-Even Scenarios
Implementation Roadmap
Risk Mitigation

1. Problem Statement

Current State

Our production system (ControlledAd.py) serves contextual ads with human-in-the-loop approval:

Fetch article (title + body)
Generate embeddings (256d)
Find anchor ad via similarity search
Exploration trigger: When predefined categories or approved candidates fail similarity threshold
Brand safety check (LLM call on title + content)
Generate candidate variants (LLM with mega-prompt: brand + styling + strategies + few-shot + safety instructions)
Generate image
Human approval → serve winning ad

Problem: We use expensive models uniformly regardless of:

Expected click value (advertiser's willingness to pay)
Domain quality/reputation (premium publishers vs low-traffic blogs)
Content complexity (simple product ads vs nuanced brand campaigns)

Result: Unprofitable on low-eCPM impressions, over-engineered for simple contexts.

Goal

Dynamically select model tier (LLM size, image generation quality) based on:

$$ \text{ModelTier} = f(\text{eCPM}, \text{domainStats}, \text{contentComplexity}) $$

Constraint: Maintain quality standards while maximizing profit margin per impression.

2. API Cost Analysis (2026)

2.1 LLM Pricing (Text Generation)

Model	Provider	Input Cost ($/1M tokens)	Output Cost ($/1M tokens)	Context	Speed	Use Case
GPT-5.2	OpenAI	$1.75	$14.00	400K	Fast	Premium tier
Gemini 3 Pro	Google	$2.00	$12.00	200K	Fast	Premium tier
Gemini 3 Flash	Google	$0.50	$3.00	1M	Very Fast	Balanced tier
Gemini 2.0 Flash	Google	$0.10	$0.40	1M	Very Fast	Budget tier
Llama 3.1 8B	Groq	$0.05	$0.08	128K	Ultra Fast	Ultra-budget
Mixtral 8x7B	Groq	$0.27	$0.27	32K	Very Fast	Budget alternative
Claude Haiku	Anthropic	$1.00	$5.00	200K	Fast	Budget fallback

Key Observations:

Gemini 2.0 Flash is 20x cheaper than Gemini 3 Pro on input, 30x cheaper than GPT-5.2
Groq inference is 35-40x cheaper than premium models with acceptable quality trade-offs
Context caching available on Gemini (75% savings on repeated prompts, cache reads at 10% of input price)
GPT-5.2 generates internal "thinking" tokens billed as output ($14/1M)

Sources:

2.2 Image Generation Pricing

Model	Provider	Resolution	Cost per Image	Speed	Use Case
Imagen 3	Google	1024×1024	$0.030	~8s	Premium tier
FLUX.1 [pro]	Replicate	1024×1024	$0.055	~10s	High quality
FLUX.1 [dev]	Replicate	1024×1024	$0.030	~6s	Balanced tier
FLUX.1 [schnell]	Replicate	1024×1024	$0.003	~2s	Budget tier

Key Observations:

Flux schnell is 10x cheaper than Imagen 3 with acceptable quality
2-3 second generation time enables real-time workflows
Flux dev offers good balance (same price as Imagen 3, faster)

Sources:

2.3 Realistic Request Cost Breakdown

Production mega-prompt structure:

Brand description: ~200 tokens
Styling instructions: ~300 tokens
Strategy guidelines: ~200 tokens
Few-shot examples (3-5 examples): ~500 tokens
Safety instructions: ~150 tokens
Article content (full): ~2500 tokens
Total input: ~3,850 tokens

Brand safety call:

System prompt: ~200 tokens
Article (title + content): ~2500 tokens
Total safety check: ~2,700 tokens

Scenario: Generate 1 contextual ad with exploration triggered

Component	Tokens/Params	Model	Cost
Brand safety check	2,700 input + 50 output	GPT-5.2 (current prod)	$0.00543
Article embedding	2,500 input	text-embedding-3-small	$0.00005
Tagline generation	3,850 input + 150 output	GPT-5.2 or Gemini 3 Pro	$0.00884
Image generation	1 image	Imagen 3	$0.03000
Total (Premium)			$0.0443

Alternative (Budget):

Component	Tokens/Params	Model	Cost
Brand safety check	2,700 input + 50 output	GPT-5.2 (unchanged)	$0.00543
Article embedding	800 input (title + para1)	text-embedding-3-small	$0.00002
Tagline generation (compact)	1,200 input + 150 output	Gemini 2.0 Flash	$0.00018
Image generation	1 image	Flux schnell	$0.00300
Total (Budget)			$0.0086

Savings: 91% cost reduction per generation

Ultra-budget (Groq):

Component	Tokens/Params	Model	Cost
Brand safety check	2,700 input + 50 output	Llama 3.1 8B (Groq)	$0.00014
Tagline generation (compact)	1,200 input + 150 output	Llama 3.1 8B (Groq)	$0.00007
Image generation	1 image	Flux schnell	$0.00300
Total (Ultra-budget)			$0.0032

Savings: 92% cost reduction, 5x faster

3. Simple Approach: Rule-Based Routing

3.1 Routing Logic

def select_model_tier(
    ecpm: float,              # Expected CPM ($/1000 impressions)
    domain_quality: str,      # 'premium' | 'standard' | 'low'
    content_length: int,      # Article word count
    campaign_type: str        # 'brand_awareness' | 'performance'
) -> dict:
    """
    Simple rule-based model selection.

    Returns:
        {
            'llm': str,
            'llm_tier': 'premium' | 'balanced' | 'budget',
            'image': str,
            'image_tier': 'premium' | 'balanced' | 'budget',
            'input_mode': 'full_article' | 'title_plus_para1',
            'max_cost': float
        }
    """

    # Profitability threshold (must cover at least 2x generation cost)
    MIN_ECPM_PREMIUM = 10.0   # $10 eCPM = $0.010 per impression
    MIN_ECPM_BALANCED = 3.0   # $3 eCPM = $0.003 per impression

    # Decision tree
    if ecpm >= MIN_ECPM_PREMIUM and domain_quality == 'premium':
        # High-value, premium publishers → best quality
        return {
            'llm': 'gpt-5.2',  # or 'gemini-3-pro'
            'llm_tier': 'premium',
            'image': 'imagen-3',
            'image_tier': 'premium',
            'input_mode': 'full_article',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0443
        }

    elif ecpm >= MIN_ECPM_BALANCED and domain_quality in ['premium', 'standard']:
        # Mid-value, good publishers → balanced
        return {
            'llm': 'gemini-3-flash',
            'llm_tier': 'balanced',
            'image': 'flux-dev',
            'image_tier': 'balanced',
            'input_mode': 'title_plus_para1',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0117
        }

    elif campaign_type == 'brand_awareness':
        # Brand campaigns → prioritize quality over cost
        return {
            'llm': 'gpt-5.2',  # or 'gemini-3-pro'
            'llm_tier': 'premium',
            'image': 'flux-dev',  # Balanced image sufficient
            'image_tier': 'balanced',
            'input_mode': 'title_plus_para1',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0159
        }

    else:
        # Low-value or unproven domains → budget
        return {
            'llm': 'gemini-2-flash',
            'llm_tier': 'budget',
            'image': 'flux-schnell',
            'image_tier': 'budget',
            'input_mode': 'title_plus_para1',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0086
        }

3.2 Domain Quality Classification

Data Sources (existing in production):

Impression count (from impressions table)
CTR history (clicks / impressions per domain)
Human approval rate (from controlled_ads type=2 vs type=-1)
Publisher whitelist/blacklist

Simple Heuristic:

def classify_domain_quality(domain: str) -> str:
    """Classify domain based on historical stats."""
    stats = get_domain_stats(domain)

    if domain in PREMIUM_WHITELIST:
        return 'premium'

    if stats['impression_count'] > 10000 and stats['ctr'] > 0.02:
        return 'premium'

    if stats['impression_count'] > 1000 and stats['ctr'] > 0.01:
        return 'standard'

    return 'low'

3.3 Integration into ControlledAd.py

Modification point before exploration trigger:

def _trigger_exploration_async(self, selected_ad: Dict | None) -> None:
    """Trigger exploration with dynamic model selection."""

    # NEW: Select model tier before generation
    model_config = select_model_tier(
        ecpm=self.calculate_ecpm(),
        domain_quality=self.classify_domain(),
        content_length=len(self.article_text.split()),
        campaign_type=self.campaign_type
    )

    # Store config for exploration method to use
    self.model_config = model_config

    # Existing exploration logic...
    if self.cache.get_from_cache(self.key_lock_exploration):
        return

    self.cache.update_cache(
        self.key_lock_exploration,
        {'exploration_in_progress': 1},
        EXPIRATION_60_SEC
    )

    if self.exploration_method:
        async_call(self._execute_exploration_on_copy, selected_ad)

3.4 Expected Impact

Traffic Distribution (estimated):

Tier	% Traffic	Avg eCPM	Current Cost	New Cost	Savings
Premium	15%	$12.00	$0.0443	$0.0443	$0
Balanced	35%	$5.00	$0.0443	$0.0117	$0.0326
Budget	50%	$1.50	$0.0443	$0.0086	$0.0357

Total Savings: (0.35 × $0.0326) + (0.50 × $0.0357) = $0.0293 per impression (66% reduction)

Annual Impact (1M impressions/month):

Current: $44,300/month
New: $15,055/month
Savings: $29,245/month ($350,940/year)

4. Advanced Approach: Learned Routing

4.1 Integration with Online Learning System

Once the self-learning framework (Ĉ, P̂, DSPy) is operational, upgrade routing to use learned signals:

def select_model_tier_learned(
    context: dict,           # Brand, article, domain
    C_hat_threshold: float = 0.7,  # Quality predictor threshold
    P_hat_threshold: float = 0.02, # Performance predictor threshold
    ecpm: float = None
) -> dict:
    """
    Learned model selection using quality and performance predictors.

    Key insight: If we predict high approval (Ĉ) and high CTR (P̂),
    it's worth investing in premium models. Otherwise, use budget.
    """

    # Quick quality pre-check using Ĉ on anchor ad
    anchor_quality = C_hat(context['brand'], context['article'], context['anchor'])

    # Predicted performance using P̂ on anchor
    predicted_ctr = P_hat(context['article'], context['anchor'])

    # Calculate expected value of premium vs budget generation
    premium_value = (
        predicted_ctr * 1.2 *  # Assume 20% CTR lift from premium models
        ecpm / 1000 -           # Revenue per impression
        0.0411                  # Premium cost
    )

    budget_value = (
        predicted_ctr *         # No CTR lift assumption
        ecpm / 1000 -           # Revenue per impression
        0.0035                  # Budget cost
    )

    # Decision: use premium only if EV is higher
    if premium_value > budget_value and anchor_quality > C_hat_threshold:
        return {
            'llm': 'gpt-5.2',  # or 'gemini-3-pro'
            'llm_tier': 'premium',
            'image': 'imagen-3',
            'image_tier': 'premium',
            'input_mode': 'full_article',
            'expected_value': premium_value,
            'reason': f'High quality ({anchor_quality:.2f}) + high CTR ({predicted_ctr:.3f})'
        }

    else:
        return {
            'llm': 'gemini-2-flash',
            'llm_tier': 'budget',
            'image': 'flux-schnell',
            'image_tier': 'budget',
            'input_mode': 'title_plus_para1',
            'expected_value': budget_value,
            'reason': f'Budget sufficient (quality={anchor_quality:.2f}, CTR={predicted_ctr:.3f})'
        }

4.2 Multi-Armed Bandit for Model Tier Selection

Treat model tier selection as a contextual bandit problem:

Context: (brand_id, domain_tier, content_category, article_length) Actions: (premium, balanced, budget) Reward: (revenue - cost) per impression

class ModelTierBandit:
    """Contextual bandit for model tier selection."""

    def __init__(self):
        self.policy = EpsilonGreedy(epsilon=0.1)
        self.context_encoder = embed_context
        self.Q_table = defaultdict(lambda: {'premium': 0.0, 'balanced': 0.0, 'budget': 0.0})

    def select_tier(self, context: dict) -> str:
        """Select model tier using ε-greedy policy."""
        context_key = self.context_encoder(context)

        if random.random() < self.policy.epsilon:
            return random.choice(['premium', 'balanced', 'budget'])
        else:
            return max(self.Q_table[context_key], key=self.Q_table[context_key].get)

    def update(self, context: dict, tier: str, reward: float):
        """Update Q-value after observing reward."""
        context_key = self.context_encoder(context)
        alpha = 0.1  # Learning rate
        old_Q = self.Q_table[context_key][tier]
        self.Q_table[context_key][tier] = old_Q + alpha * (reward - old_Q)

5. Fast Wins: Input Token Optimization

5.1 Current Input (Full Article + Mega Prompt)

Typical production prompt:

Mega-prompt components: ~1,350 tokens
- Brand description: 200
- Styling instructions: 300
- Strategy guidelines: 200
- Few-shot examples: 500
- Safety instructions: 150
Article (full): ~2,500 tokens
Total input: ~3,850 tokens

Brand safety call:

Article (title + content): ~2,500 tokens
Safety prompt: ~200 tokens
Total: ~2,700 tokens

5.2 Optimized Input (Title + First Paragraph)

Reduced input:

Mega-prompt components: ~1,350 tokens (same)
Article (title + para1): ~400 tokens
Total input: ~1,750 tokens

Brand safety call (unchanged):

Still uses full article for safety: ~2,700 tokens

Savings: 54% input token reduction on generation (safety unchanged for quality)

5.3 Cost Impact

Model	Full Article Cost	Compact Cost	Savings
GPT-5.2	$0.00884	$0.00401	$0.00483 (55%)
Gemini 3 Pro	$0.00950	$0.00431	$0.00519 (55%)
Gemini 2.0 Flash	$0.00039	$0.00018	$0.00021 (54%)

5.4 Implementation

def build_compact_prompt(self, context: dict) -> str:
    """Build prompt using only title + first paragraph."""

    article_title = context['article']['title']
    article_body = context['article']['body']

    # Extract first paragraph (split by \n\n or first 150 words)
    first_paragraph = self.extract_first_paragraph(article_body, max_words=150)

    # Mega-prompt components (unchanged)
    mega_prompt = self.build_mega_prompt_base(context['brand'])

    prompt = f"""{mega_prompt}

Article title: {article_title}
Article excerpt: {first_paragraph}

Anchor tagline: {context['anchor']['tagline']}

Generate contextual tagline variant following brand guidelines above.

Tagline:
"""

    return prompt

6. ROI Analysis & Break-Even Scenarios

6.1 Cost-Revenue Model

Profit per impression:

$$ \text{Profit} = \frac{\text{eCPM}}{1000} - C_{gen} $$

Or for CPC campaigns:

$$ \text{Profit} = \text{CTR} \times \text{CPC} - C_{gen} $$

6.2 Break-Even Analysis

Model Tier	$C_{gen}$	Break-even eCPM (2× margin)	Break-even CPC (1% CTR)
Premium (GPT-5.2 + Imagen)	$0.0443	$88.60	$4.43
Balanced (Gemini 3 Flash + Flux)	$0.0117	$23.40	$1.17
Budget (Gemini 2 Flash + Flux)	$0.0086	$17.20	$0.86

Interpretation:

Premium tier requires $82+ eCPM to be profitable with 2× margin
Budget tier profitable at $7 eCPM (achievable on most campaigns)
Ultra-cheap models (Groq + Flux schnell) profitable at <$1 eCPM

6.3 Scenario Analysis

Scenario 1: Mid-value campaign (eCPM = $5, CTR = 1.5%)

Tier	Cost	Revenue	Profit	ROI
Premium	$0.0443	$0.0050	-$0.0393	-88.7%
Balanced	$0.0117	$0.0050	-$0.0067	-57.3%
Budget	$0.0086	$0.0050	-$0.0036	-41.9%

Conclusion: Only budget tier profitable for typical campaigns.

Scenario 2: Premium publisher (CPC = $8, CTR = 2.5%)

Tier	Cost	Revenue (CTR × CPC)	Profit	ROI
Premium	$0.0443	$0.20	$0.1557	+351.5%
Balanced	$0.0117	$0.20	$0.1883	+1609.4%
Budget	$0.0086	$0.20 × 0.95	$0.1814	+2109.3%

Insight: Even with -5% quality penalty, budget tier delivers highest ROI. Premium justified only for brand-sensitive campaigns.

7. Implementation Roadmap

Phase 1: Simple Rule-Based Routing

Deliverables:

select_model_tier() function with eCPM + domain quality routing
Domain quality classifier (premium/standard/low)
Integration into ControlledAd._trigger_exploration_async()
Logging: model_tier, generation_cost, decision_reason

Success criteria:

50% of traffic routed to budget tier
No drop in approval rate
Cost savings confirmed

Phase 2: Input Token Optimization

Deliverables:

build_compact_prompt() using title + para1
A/B test framework (50/50 split)
Quality monitoring dashboard

Success criteria:

<5% approval rate drop
<3% CTR drop
54% input token savings confirmed

Phase 3: Adaptive Thresholds

Deliverables:

Historical analysis: profit vs tier by campaign
Per-campaign threshold learning
Threshold update automation

Success criteria:

10% additional profit vs fixed thresholds
Thresholds stable (not oscillating)

Phase 4: Learned Routing

Deliverables:

select_model_tier_learned() using Ĉ and P̂
Expected value calculation framework
Bandit policy for exploration

Prerequisites:

Ĉ (quality predictor) trained and deployed
P̂ (performance predictor) trained and deployed
Propensity logging operational

Success criteria:

15% profit improvement vs rule-based
Bandit policy converges

8. Risk Mitigation

8.1 Quality Degradation Risk

Risk: Budget models produce lower quality, reducing approval rate and CTR.

Mitigation:

Start with conservative thresholds (only low-value traffic to budget)
Monitor approval rate daily, alert if <80%
Circuit breaker: auto-revert to premium if approval drops >10%
Human review sample: 100 budget-generated ads for manual QA

8.2 Profitability Threshold Risk

Risk: eCPM thresholds miscalibrated, losing money on expensive generations.

Mitigation:

Default to budget tier unless eCPM exceeds 2× generation cost
Continuous profit analysis per tier
Threshold adjustment automation

8.3 Model Availability Risk

Risk: Primary model down or rate-limited, fallback needed.

Mitigation:

Fallback chain: Gemini 2.0 Flash → Groq Llama → Claude Haiku
Cache model availability status (Redis, 1min TTL)
Alert if fallback rate >5%

Appendix A: Detailed Pricing Tables

Text Generation (per 1M tokens)

Model	Provider	Input	Output	Speed	Context
GPT-5.2	OpenAI	$1.75	$14.00	Fast	400K
Gemini 3 Pro	Google	$2.00	$12.00	Fast	200K
Gemini 3 Flash	Google	$0.50	$3.00	Very Fast	1M
Gemini 2.0 Flash	Google	$0.10	$0.40	Very Fast	1M
Llama 3.1 8B	Groq	$0.05	$0.08	Ultra Fast	128K
Mixtral 8x7B	Groq	$0.27	$0.27	Very Fast	32K
Claude Haiku	Anthropic	$1.00	$5.00	Fast	200K

Image Generation

Model	Provider	Resolution	Cost	Speed
Imagen 3	Google	1024×1024	$0.030	~8s
FLUX.1 [pro]	Replicate	1024×1024	$0.055	~10s
FLUX.1 [dev]	Replicate	1024×1024	$0.030	~6s
FLUX.1 [schnell]	Replicate	1024×1024	$0.003	~2s

Sources

End of Document

albertbn/dynamic_model_selection_design.md

Dynamic Model Selection for Contextual Ad Generation

Simple Deterministic Interface for Campaign Economics

Overview

Interface Design (Spreadsheet/UI)

Token Settings Definition

Decision Logic (Deterministic Rules)

Major Inference Providers (Reference)

Required Stats from PostgreSQL

Implementation

Example Use Cases

Summary

Table of Contents

1. Problem Statement

Current State

Goal

2. API Cost Analysis (2026)

2.1 LLM Pricing (Text Generation)

2.2 Image Generation Pricing

2.3 Realistic Request Cost Breakdown

3. Simple Approach: Rule-Based Routing

3.1 Routing Logic

3.2 Domain Quality Classification

3.3 Integration into ControlledAd.py

3.4 Expected Impact

4. Advanced Approach: Learned Routing

4.1 Integration with Online Learning System

4.2 Multi-Armed Bandit for Model Tier Selection

5. Fast Wins: Input Token Optimization

5.1 Current Input (Full Article + Mega Prompt)

5.2 Optimized Input (Title + First Paragraph)

5.3 Cost Impact

5.4 Implementation

6. ROI Analysis & Break-Even Scenarios

6.1 Cost-Revenue Model

6.2 Break-Even Analysis

6.3 Scenario Analysis

7. Implementation Roadmap

Phase 1: Simple Rule-Based Routing

Phase 2: Input Token Optimization

Phase 3: Adaptive Thresholds

Phase 4: Learned Routing

8. Risk Mitigation

8.1 Quality Degradation Risk

8.2 Profitability Threshold Risk

8.3 Model Availability Risk

Appendix A: Detailed Pricing Tables

Text Generation (per 1M tokens)

Image Generation

Sources