Skip to content

Instantly share code, notes, and snippets.

@albertbn
Last active February 15, 2026 11:46
Show Gist options
  • Select an option

  • Save albertbn/c78c4bbaae3a7f8b8e0087f92ee6b151 to your computer and use it in GitHub Desktop.

Select an option

Save albertbn/c78c4bbaae3a7f8b8e0087f92ee6b151 to your computer and use it in GitHub Desktop.
Dynamic Model Selection for Contextual Ad Generation - Design Proposal

Dynamic Model Selection for Contextual Ad Generation

Author: Albert Bentov Date: 2026-02-11 Status: Design Proposal


Simple Deterministic Interface for Campaign Economics

Date: Sun, Feb 15th 2026

Overview

Before implementing ML-based routing, start with a spreadsheet-based interface per campaign where users input economics and receive deterministic model + token recommendations.

Interface Design (Spreadsheet/UI)

User Inputs (per campaign):

Parameter Description Example Value
ecpm Actual revenue per 1000 impressions ($) $3.50
target_roi Required return on investment (%) 150% (1.5x)
campaign_type brand_awareness or performance performance
max_cost_per_gen Budget constraint per generation ($) $0.010
expected_ctr Target click-through rate (%) 0.4%
domain_tier premium, standard, or low standard
monthly_impressions Forecasted impression volume 800,000

System Recommendations (output):

Output Description Example Value
model_tier premium, balanced, or budget budget
token_setting Input context size low (title + para1)
llm_model Specific model to use gemini-2-flash
image_model Image generation model flux-schnell
cost_per_generation Expected generation cost $0.0086
breakeven_ecpm Minimum eCPM for profitability at target ROI $1.29
monthly_cost Projected monthly spend $430
monthly_profit Projected profit (eCPM × volume - cost) $-255
recommendation Profitability assessment ✅ Profitable with budget tier

Token Settings Definition

Setting Article Input Input Tokens Use Case
high Full article (title + body) ~2500 tokens Premium campaigns, complex content
medium Title + first 2-3 paragraphs ~800 tokens Balanced quality/cost
low Title + first paragraph ~400 tokens Budget campaigns, simple content

Decision Logic (Deterministic Rules)

def recommend_model_config(
    ecpm: float,
    target_roi: float,
    max_cost_per_gen: float,
    campaign_type: str,
    domain_tier: str
) -> dict:
    """
    Deterministic model + token recommendation based on campaign economics.

    Returns config with profitability check and provider tier bins.
    """

    # Calculate max allowable cost per generation
    max_cost = (ecpm / 1000) / target_roi
    max_cost = min(max_cost, max_cost_per_gen)  # Respect budget constraint

    # Decision tree with provider tier bins
    if max_cost >= 0.040 and (domain_tier == 'premium' or campaign_type == 'brand_awareness'):
        return {
            'model_tier': 'premium',
            'token_setting': 'high',
            'llm_providers': 'OpenAI GPT (flagship), Anthropic Claude (flagship), Google Gemini (pro tier)',
            'image_providers': 'Google Imagen, Stability AI (premium), Midjourney',
            'cost_range': '$0.040 - $0.050',
            'cost_per_generation': 0.0443,
            'profitable': max_cost >= 0.0443
        }

    elif max_cost >= 0.010:
        return {
            'model_tier': 'balanced',
            'token_setting': 'medium',
            'llm_providers': 'Google Gemini (flash tier), Anthropic Claude (fast tier), Together AI',
            'image_providers': 'Replicate FLUX (dev tier), Stability AI (standard)',
            'cost_range': '$0.010 - $0.015',
            'cost_per_generation': 0.0117,
            'profitable': max_cost >= 0.0117
        }

    elif max_cost >= 0.003:
        return {
            'model_tier': 'budget',
            'token_setting': 'low',
            'llm_providers': 'Google Gemini (flash-lite), Fireworks AI, Anyscale',
            'image_providers': 'Replicate FLUX (fast tier), Leonardo AI (fast)',
            'cost_range': '$0.003 - $0.010',
            'cost_per_generation': 0.0086,
            'profitable': max_cost >= 0.0086
        }

    else:
        return {
            'model_tier': 'ultra-budget',
            'token_setting': 'low',
            'llm_providers': 'Groq (Llama/Mixtral), Together AI (open models), Hugging Face Inference',
            'image_providers': 'Replicate FLUX (schnell), Stability AI (turbo)',
            'cost_range': '$0.001 - $0.003',
            'cost_per_generation': 0.0032,
            'profitable': max_cost >= 0.0032
        }

Major Inference Providers (Reference)

Text Generation:

  • OpenAI - GPT models (premium tier, high quality)
  • Anthropic - Claude models (premium tier, reasoning-focused)
  • Google - Gemini models (pro/flash tiers, good balance)
  • Groq - Open models (Llama, Mixtral) with ultra-fast inference
  • Together AI - Open source models, competitive pricing
  • Fireworks AI - Fast inference, budget-friendly
  • Anyscale - Llama/Mistral hosting, good for scale
  • Replicate - Various open models
  • Hugging Face - Inference API for open models

Image Generation:

  • Google - Imagen models (premium quality)
  • Stability AI - Stable Diffusion variants (multiple tiers)
  • Replicate - FLUX, Stable Diffusion, various models
  • Midjourney - High quality (limited API access)
  • Leonardo AI - Fast generation, multiple style presets
  • RunwayML - Creative AI tools, image generation
  • OpenAI - DALL-E models (premium tier)

Required Stats from PostgreSQL

To populate the interface and validate economics, collect these metrics:

From generations table (or equivalent):

-- Cost per generation by campaign
SELECT campaign_id,
       COUNT(*) as generation_count,
       AVG(model_cost) as avg_cost_per_gen,  -- if tracked
       AVG(token_count_input) as avg_input_tokens,
       AVG(token_count_output) as avg_output_tokens
FROM generations
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY campaign_id;

From impressions table:

-- Impressions and CTR by campaign/domain
SELECT campaign_id,
       domain,
       COUNT(*) as impression_count,
       SUM(CASE WHEN clicked = 1 THEN 1 ELSE 0 END) as click_count,
       AVG(CASE WHEN clicked = 1 THEN 1.0 ELSE 0.0 END) as ctr,
       AVG(ecpm) as avg_ecpm  -- if available
FROM impressions
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY campaign_id, domain;

From RTB win data (if available):

-- Actual revenue per impression
SELECT campaign_id,
       AVG(win_price) as avg_win_price,
       COUNT(*) as win_count,
       AVG(bid_price) as avg_bid_price
FROM rtb_wins
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY campaign_id;

From controlled_ads table:

-- Human approval rate (quality proxy)
SELECT campaign_id,
       COUNT(*) as total_generated,
       SUM(CASE WHEN type = 2 THEN 1 ELSE 0 END) as approved_count,
       AVG(CASE WHEN type = 2 THEN 1.0 ELSE 0.0 END) as approval_rate
FROM controlled_ads
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY campaign_id;

Implementation

  1. Spreadsheet template (Google Sheets/Excel) with formulas:

    • Input cells: eCPM, ROI target, campaign type, etc.
    • Calculation cells: max allowable cost, breakeven eCPM
    • Recommendation cells: model tier, token setting, profitability check
  2. Simple Python script (recommend_config.py):

    • Read campaign params from CSV/JSON
    • Apply decision logic
    • Output recommended config + economics forecast
    • Flag unprofitable campaigns
  3. Dashboard integration (future):

    • Per-campaign config UI
    • Real-time cost/profit tracking
    • One-click apply to production

Example Use Cases

Campaign A: Low-tier performance campaign

Inputs:

  • eCPM: $0.80 (actual paid)
  • Target ROI: 150% (1.5x)
  • Max cost per gen: $0.010
  • Campaign type: performance
  • Expected CTR: 0.3%
  • Domain tier: low

Calculations:

  • Max allowable cost = ($0.80 / 1000) / 1.5 = $0.00053
  • Budget constraint: $0.010

Recommendation:

  • Model tier: budget (cost $0.0086)
  • Token setting: low (400 tokens)
  • Models: gemini-2-flash + flux-schnell
  • Breakeven eCPM: $1.29 (at 1.5x ROI)
  • Assessment: ⚠️ Not profitable (eCPM too low, need $1.29+)
  • Suggestion: Reject campaign or negotiate higher eCPM

Campaign B: Mid-tier performance campaign

Inputs:

  • eCPM: $4.50 (actual paid)
  • Target ROI: 150% (1.5x)
  • Max cost per gen: $0.015
  • Campaign type: performance
  • Expected CTR: 0.4%
  • Domain tier: standard

Calculations:

  • Max allowable cost = ($4.50 / 1000) / 1.5 = $0.0030

Recommendation:

  • Model tier: budget (cost $0.0086)
  • Token setting: low (400 tokens)
  • Models: gemini-2-flash + flux-schnell
  • Breakeven eCPM: $1.29 (at 1.5x ROI)
  • Assessment: ✅ Profitable (eCPM $4.50 > breakeven $1.29)
  • Profit per impression: $0.0045 - $0.0086 = -$0.0041
  • Wait, still unprofitable! Need to reduce cost further or increase eCPM

Campaign C: Premium brand campaign

Inputs:

  • eCPM: $6.50 (actual paid)
  • Target ROI: 120% (1.2x)
  • Max cost per gen: $0.020
  • Campaign type: brand_awareness
  • Expected CTR: 0.5% (high quality)
  • Domain tier: premium

Calculations:

  • Max allowable cost = ($6.50 / 1000) / 1.2 = $0.0054

Recommendation:

  • Model tier: budget (cost $0.0086)
  • Token setting: medium (800 tokens, cost $0.0095)
  • Models: gemini-2-flash + flux-dev
  • Assessment: ⚠️ Marginally unprofitable (cost $0.0095 > allowable $0.0054)
  • Alternative: Use ultra-budget Groq + flux-schnell (cost $0.0032) → ✅ Profitable
  • Profit per impression: $0.0065 - $0.0032 = $0.0033 (✅ 203% ROI)

Summary

This document proposes a two-tiered approach for intelligent model selection in contextual ad generation:

  1. Simple Approach (immediate): Rule-based routing for current production based on expected click value (eCPM) and domain reputation
  2. Advanced Approach (post-online learning): Learned routing integrated with quality predictor (Ĉ) and performance predictor (P̂)

Expected Impact:

  • 50-70% cost reduction on low-value impressions
  • Maintained quality on high-value impressions
  • 2-5x faster response times for most requests
  • Profitability threshold enforcement per impression

Table of Contents

  1. Problem Statement
  2. API Cost Analysis (2026)
  3. Simple Approach: Rule-Based Routing
  4. Advanced Approach: Learned Routing
  5. Fast Wins: Input Token Optimization
  6. ROI Analysis & Break-Even Scenarios
  7. Implementation Roadmap
  8. Risk Mitigation

1. Problem Statement

Current State

Our production system (ControlledAd.py) serves contextual ads with human-in-the-loop approval:

  1. Fetch article (title + body)
  2. Generate embeddings (256d)
  3. Find anchor ad via similarity search
  4. Exploration trigger: When predefined categories or approved candidates fail similarity threshold
  5. Brand safety check (LLM call on title + content)
  6. Generate candidate variants (LLM with mega-prompt: brand + styling + strategies + few-shot + safety instructions)
  7. Generate image
  8. Human approval → serve winning ad

Problem: We use expensive models uniformly regardless of:

  • Expected click value (advertiser's willingness to pay)
  • Domain quality/reputation (premium publishers vs low-traffic blogs)
  • Content complexity (simple product ads vs nuanced brand campaigns)

Result: Unprofitable on low-eCPM impressions, over-engineered for simple contexts.

Goal

Dynamically select model tier (LLM size, image generation quality) based on:

$$ \text{ModelTier} = f(\text{eCPM}, \text{domainStats}, \text{contentComplexity}) $$

Constraint: Maintain quality standards while maximizing profit margin per impression.


2. API Cost Analysis (2026)

2.1 LLM Pricing (Text Generation)

Model Provider Input Cost ($/1M tokens) Output Cost ($/1M tokens) Context Speed Use Case
GPT-5.2 OpenAI $1.75 $14.00 400K Fast Premium tier
Gemini 3 Pro Google $2.00 $12.00 200K Fast Premium tier
Gemini 3 Flash Google $0.50 $3.00 1M Very Fast Balanced tier
Gemini 2.0 Flash Google $0.10 $0.40 1M Very Fast Budget tier
Llama 3.1 8B Groq $0.05 $0.08 128K Ultra Fast Ultra-budget
Mixtral 8x7B Groq $0.27 $0.27 32K Very Fast Budget alternative
Claude Haiku Anthropic $1.00 $5.00 200K Fast Budget fallback

Key Observations:

  • Gemini 2.0 Flash is 20x cheaper than Gemini 3 Pro on input, 30x cheaper than GPT-5.2
  • Groq inference is 35-40x cheaper than premium models with acceptable quality trade-offs
  • Context caching available on Gemini (75% savings on repeated prompts, cache reads at 10% of input price)
  • GPT-5.2 generates internal "thinking" tokens billed as output ($14/1M)

Sources:

2.2 Image Generation Pricing

Model Provider Resolution Cost per Image Speed Use Case
Imagen 3 Google 1024×1024 $0.030 ~8s Premium tier
FLUX.1 [pro] Replicate 1024×1024 $0.055 ~10s High quality
FLUX.1 [dev] Replicate 1024×1024 $0.030 ~6s Balanced tier
FLUX.1 [schnell] Replicate 1024×1024 $0.003 ~2s Budget tier

Key Observations:

  • Flux schnell is 10x cheaper than Imagen 3 with acceptable quality
  • 2-3 second generation time enables real-time workflows
  • Flux dev offers good balance (same price as Imagen 3, faster)

Sources:

2.3 Realistic Request Cost Breakdown

Production mega-prompt structure:

  • Brand description: ~200 tokens
  • Styling instructions: ~300 tokens
  • Strategy guidelines: ~200 tokens
  • Few-shot examples (3-5 examples): ~500 tokens
  • Safety instructions: ~150 tokens
  • Article content (full): ~2500 tokens
  • Total input: ~3,850 tokens

Brand safety call:

  • System prompt: ~200 tokens
  • Article (title + content): ~2500 tokens
  • Total safety check: ~2,700 tokens

Scenario: Generate 1 contextual ad with exploration triggered

Component Tokens/Params Model Cost
Brand safety check 2,700 input + 50 output GPT-5.2 (current prod) $0.00543
Article embedding 2,500 input text-embedding-3-small $0.00005
Tagline generation 3,850 input + 150 output GPT-5.2 or Gemini 3 Pro $0.00884
Image generation 1 image Imagen 3 $0.03000
Total (Premium) $0.0443

Alternative (Budget):

Component Tokens/Params Model Cost
Brand safety check 2,700 input + 50 output GPT-5.2 (unchanged) $0.00543
Article embedding 800 input (title + para1) text-embedding-3-small $0.00002
Tagline generation (compact) 1,200 input + 150 output Gemini 2.0 Flash $0.00018
Image generation 1 image Flux schnell $0.00300
Total (Budget) $0.0086

Savings: 91% cost reduction per generation

Ultra-budget (Groq):

Component Tokens/Params Model Cost
Brand safety check 2,700 input + 50 output Llama 3.1 8B (Groq) $0.00014
Tagline generation (compact) 1,200 input + 150 output Llama 3.1 8B (Groq) $0.00007
Image generation 1 image Flux schnell $0.00300
Total (Ultra-budget) $0.0032

Savings: 92% cost reduction, 5x faster


3. Simple Approach: Rule-Based Routing

3.1 Routing Logic

def select_model_tier(
    ecpm: float,              # Expected CPM ($/1000 impressions)
    domain_quality: str,      # 'premium' | 'standard' | 'low'
    content_length: int,      # Article word count
    campaign_type: str        # 'brand_awareness' | 'performance'
) -> dict:
    """
    Simple rule-based model selection.

    Returns:
        {
            'llm': str,
            'llm_tier': 'premium' | 'balanced' | 'budget',
            'image': str,
            'image_tier': 'premium' | 'balanced' | 'budget',
            'input_mode': 'full_article' | 'title_plus_para1',
            'max_cost': float
        }
    """

    # Profitability threshold (must cover at least 2x generation cost)
    MIN_ECPM_PREMIUM = 10.0   # $10 eCPM = $0.010 per impression
    MIN_ECPM_BALANCED = 3.0   # $3 eCPM = $0.003 per impression

    # Decision tree
    if ecpm >= MIN_ECPM_PREMIUM and domain_quality == 'premium':
        # High-value, premium publishers → best quality
        return {
            'llm': 'gpt-5.2',  # or 'gemini-3-pro'
            'llm_tier': 'premium',
            'image': 'imagen-3',
            'image_tier': 'premium',
            'input_mode': 'full_article',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0443
        }

    elif ecpm >= MIN_ECPM_BALANCED and domain_quality in ['premium', 'standard']:
        # Mid-value, good publishers → balanced
        return {
            'llm': 'gemini-3-flash',
            'llm_tier': 'balanced',
            'image': 'flux-dev',
            'image_tier': 'balanced',
            'input_mode': 'title_plus_para1',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0117
        }

    elif campaign_type == 'brand_awareness':
        # Brand campaigns → prioritize quality over cost
        return {
            'llm': 'gpt-5.2',  # or 'gemini-3-pro'
            'llm_tier': 'premium',
            'image': 'flux-dev',  # Balanced image sufficient
            'image_tier': 'balanced',
            'input_mode': 'title_plus_para1',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0159
        }

    else:
        # Low-value or unproven domains → budget
        return {
            'llm': 'gemini-2-flash',
            'llm_tier': 'budget',
            'image': 'flux-schnell',
            'image_tier': 'budget',
            'input_mode': 'title_plus_para1',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0086
        }

3.2 Domain Quality Classification

Data Sources (existing in production):

  • Impression count (from impressions table)
  • CTR history (clicks / impressions per domain)
  • Human approval rate (from controlled_ads type=2 vs type=-1)
  • Publisher whitelist/blacklist

Simple Heuristic:

def classify_domain_quality(domain: str) -> str:
    """Classify domain based on historical stats."""
    stats = get_domain_stats(domain)

    if domain in PREMIUM_WHITELIST:
        return 'premium'

    if stats['impression_count'] > 10000 and stats['ctr'] > 0.02:
        return 'premium'

    if stats['impression_count'] > 1000 and stats['ctr'] > 0.01:
        return 'standard'

    return 'low'

3.3 Integration into ControlledAd.py

Modification point before exploration trigger:

def _trigger_exploration_async(self, selected_ad: Dict | None) -> None:
    """Trigger exploration with dynamic model selection."""

    # NEW: Select model tier before generation
    model_config = select_model_tier(
        ecpm=self.calculate_ecpm(),
        domain_quality=self.classify_domain(),
        content_length=len(self.article_text.split()),
        campaign_type=self.campaign_type
    )

    # Store config for exploration method to use
    self.model_config = model_config

    # Existing exploration logic...
    if self.cache.get_from_cache(self.key_lock_exploration):
        return

    self.cache.update_cache(
        self.key_lock_exploration,
        {'exploration_in_progress': 1},
        EXPIRATION_60_SEC
    )

    if self.exploration_method:
        async_call(self._execute_exploration_on_copy, selected_ad)

3.4 Expected Impact

Traffic Distribution (estimated):

Tier % Traffic Avg eCPM Current Cost New Cost Savings
Premium 15% $12.00 $0.0443 $0.0443 $0
Balanced 35% $5.00 $0.0443 $0.0117 $0.0326
Budget 50% $1.50 $0.0443 $0.0086 $0.0357

Total Savings: (0.35 × $0.0326) + (0.50 × $0.0357) = $0.0293 per impression (66% reduction)

Annual Impact (1M impressions/month):

  • Current: $44,300/month
  • New: $15,055/month
  • Savings: $29,245/month ($350,940/year)

4. Advanced Approach: Learned Routing

4.1 Integration with Online Learning System

Once the self-learning framework (Ĉ, P̂, DSPy) is operational, upgrade routing to use learned signals:

def select_model_tier_learned(
    context: dict,           # Brand, article, domain
    C_hat_threshold: float = 0.7,  # Quality predictor threshold
    P_hat_threshold: float = 0.02, # Performance predictor threshold
    ecpm: float = None
) -> dict:
    """
    Learned model selection using quality and performance predictors.

    Key insight: If we predict high approval (Ĉ) and high CTR (P̂),
    it's worth investing in premium models. Otherwise, use budget.
    """

    # Quick quality pre-check using Ĉ on anchor ad
    anchor_quality = C_hat(context['brand'], context['article'], context['anchor'])

    # Predicted performance using P̂ on anchor
    predicted_ctr = P_hat(context['article'], context['anchor'])

    # Calculate expected value of premium vs budget generation
    premium_value = (
        predicted_ctr * 1.2 *  # Assume 20% CTR lift from premium models
        ecpm / 1000 -           # Revenue per impression
        0.0411                  # Premium cost
    )

    budget_value = (
        predicted_ctr *         # No CTR lift assumption
        ecpm / 1000 -           # Revenue per impression
        0.0035                  # Budget cost
    )

    # Decision: use premium only if EV is higher
    if premium_value > budget_value and anchor_quality > C_hat_threshold:
        return {
            'llm': 'gpt-5.2',  # or 'gemini-3-pro'
            'llm_tier': 'premium',
            'image': 'imagen-3',
            'image_tier': 'premium',
            'input_mode': 'full_article',
            'expected_value': premium_value,
            'reason': f'High quality ({anchor_quality:.2f}) + high CTR ({predicted_ctr:.3f})'
        }

    else:
        return {
            'llm': 'gemini-2-flash',
            'llm_tier': 'budget',
            'image': 'flux-schnell',
            'image_tier': 'budget',
            'input_mode': 'title_plus_para1',
            'expected_value': budget_value,
            'reason': f'Budget sufficient (quality={anchor_quality:.2f}, CTR={predicted_ctr:.3f})'
        }

4.2 Multi-Armed Bandit for Model Tier Selection

Treat model tier selection as a contextual bandit problem:

Context: (brand_id, domain_tier, content_category, article_length) Actions: (premium, balanced, budget) Reward: (revenue - cost) per impression

class ModelTierBandit:
    """Contextual bandit for model tier selection."""

    def __init__(self):
        self.policy = EpsilonGreedy(epsilon=0.1)
        self.context_encoder = embed_context
        self.Q_table = defaultdict(lambda: {'premium': 0.0, 'balanced': 0.0, 'budget': 0.0})

    def select_tier(self, context: dict) -> str:
        """Select model tier using ε-greedy policy."""
        context_key = self.context_encoder(context)

        if random.random() < self.policy.epsilon:
            return random.choice(['premium', 'balanced', 'budget'])
        else:
            return max(self.Q_table[context_key], key=self.Q_table[context_key].get)

    def update(self, context: dict, tier: str, reward: float):
        """Update Q-value after observing reward."""
        context_key = self.context_encoder(context)
        alpha = 0.1  # Learning rate
        old_Q = self.Q_table[context_key][tier]
        self.Q_table[context_key][tier] = old_Q + alpha * (reward - old_Q)

5. Fast Wins: Input Token Optimization

5.1 Current Input (Full Article + Mega Prompt)

Typical production prompt:

  • Mega-prompt components: ~1,350 tokens
    • Brand description: 200
    • Styling instructions: 300
    • Strategy guidelines: 200
    • Few-shot examples: 500
    • Safety instructions: 150
  • Article (full): ~2,500 tokens
  • Total input: ~3,850 tokens

Brand safety call:

  • Article (title + content): ~2,500 tokens
  • Safety prompt: ~200 tokens
  • Total: ~2,700 tokens

5.2 Optimized Input (Title + First Paragraph)

Reduced input:

  • Mega-prompt components: ~1,350 tokens (same)
  • Article (title + para1): ~400 tokens
  • Total input: ~1,750 tokens

Brand safety call (unchanged):

  • Still uses full article for safety: ~2,700 tokens

Savings: 54% input token reduction on generation (safety unchanged for quality)

5.3 Cost Impact

Model Full Article Cost Compact Cost Savings
GPT-5.2 $0.00884 $0.00401 $0.00483 (55%)
Gemini 3 Pro $0.00950 $0.00431 $0.00519 (55%)
Gemini 2.0 Flash $0.00039 $0.00018 $0.00021 (54%)

5.4 Implementation

def build_compact_prompt(self, context: dict) -> str:
    """Build prompt using only title + first paragraph."""

    article_title = context['article']['title']
    article_body = context['article']['body']

    # Extract first paragraph (split by \n\n or first 150 words)
    first_paragraph = self.extract_first_paragraph(article_body, max_words=150)

    # Mega-prompt components (unchanged)
    mega_prompt = self.build_mega_prompt_base(context['brand'])

    prompt = f"""{mega_prompt}

Article title: {article_title}
Article excerpt: {first_paragraph}

Anchor tagline: {context['anchor']['tagline']}

Generate contextual tagline variant following brand guidelines above.

Tagline:
"""

    return prompt

6. ROI Analysis & Break-Even Scenarios

6.1 Cost-Revenue Model

Profit per impression:

$$ \text{Profit} = \frac{\text{eCPM}}{1000} - C_{gen} $$

Or for CPC campaigns:

$$ \text{Profit} = \text{CTR} \times \text{CPC} - C_{gen} $$

6.2 Break-Even Analysis

Model Tier $C_{gen}$ Break-even eCPM (2× margin) Break-even CPC (1% CTR)
Premium (GPT-5.2 + Imagen) $0.0443 $88.60 $4.43
Balanced (Gemini 3 Flash + Flux) $0.0117 $23.40 $1.17
Budget (Gemini 2 Flash + Flux) $0.0086 $17.20 $0.86

Interpretation:

  • Premium tier requires $82+ eCPM to be profitable with 2× margin
  • Budget tier profitable at $7 eCPM (achievable on most campaigns)
  • Ultra-cheap models (Groq + Flux schnell) profitable at <$1 eCPM

6.3 Scenario Analysis

Scenario 1: Mid-value campaign (eCPM = $5, CTR = 1.5%)

Tier Cost Revenue Profit ROI
Premium $0.0443 $0.0050 -$0.0393 -88.7%
Balanced $0.0117 $0.0050 -$0.0067 -57.3%
Budget $0.0086 $0.0050 -$0.0036 -41.9%

Conclusion: Only budget tier profitable for typical campaigns.

Scenario 2: Premium publisher (CPC = $8, CTR = 2.5%)

Tier Cost Revenue (CTR × CPC) Profit ROI
Premium $0.0443 $0.20 $0.1557 +351.5%
Balanced $0.0117 $0.20 $0.1883 +1609.4%
Budget $0.0086 $0.20 × 0.95 $0.1814 +2109.3%

Insight: Even with -5% quality penalty, budget tier delivers highest ROI. Premium justified only for brand-sensitive campaigns.


7. Implementation Roadmap

Phase 1: Simple Rule-Based Routing

Deliverables:

  1. select_model_tier() function with eCPM + domain quality routing
  2. Domain quality classifier (premium/standard/low)
  3. Integration into ControlledAd._trigger_exploration_async()
  4. Logging: model_tier, generation_cost, decision_reason

Success criteria:

  • 50% of traffic routed to budget tier
  • No drop in approval rate
  • Cost savings confirmed

Phase 2: Input Token Optimization

Deliverables:

  1. build_compact_prompt() using title + para1
  2. A/B test framework (50/50 split)
  3. Quality monitoring dashboard

Success criteria:

  • <5% approval rate drop
  • <3% CTR drop
  • 54% input token savings confirmed

Phase 3: Adaptive Thresholds

Deliverables:

  1. Historical analysis: profit vs tier by campaign
  2. Per-campaign threshold learning
  3. Threshold update automation

Success criteria:

  • 10% additional profit vs fixed thresholds
  • Thresholds stable (not oscillating)

Phase 4: Learned Routing

Deliverables:

  1. select_model_tier_learned() using Ĉ and P̂
  2. Expected value calculation framework
  3. Bandit policy for exploration

Prerequisites:

  • Ĉ (quality predictor) trained and deployed
  • P̂ (performance predictor) trained and deployed
  • Propensity logging operational

Success criteria:

  • 15% profit improvement vs rule-based
  • Bandit policy converges

8. Risk Mitigation

8.1 Quality Degradation Risk

Risk: Budget models produce lower quality, reducing approval rate and CTR.

Mitigation:

  1. Start with conservative thresholds (only low-value traffic to budget)
  2. Monitor approval rate daily, alert if <80%
  3. Circuit breaker: auto-revert to premium if approval drops >10%
  4. Human review sample: 100 budget-generated ads for manual QA

8.2 Profitability Threshold Risk

Risk: eCPM thresholds miscalibrated, losing money on expensive generations.

Mitigation:

  1. Default to budget tier unless eCPM exceeds 2× generation cost
  2. Continuous profit analysis per tier
  3. Threshold adjustment automation

8.3 Model Availability Risk

Risk: Primary model down or rate-limited, fallback needed.

Mitigation:

  1. Fallback chain: Gemini 2.0 Flash → Groq Llama → Claude Haiku
  2. Cache model availability status (Redis, 1min TTL)
  3. Alert if fallback rate >5%

Appendix A: Detailed Pricing Tables

Text Generation (per 1M tokens)

Model Provider Input Output Speed Context
GPT-5.2 OpenAI $1.75 $14.00 Fast 400K
Gemini 3 Pro Google $2.00 $12.00 Fast 200K
Gemini 3 Flash Google $0.50 $3.00 Very Fast 1M
Gemini 2.0 Flash Google $0.10 $0.40 Very Fast 1M
Llama 3.1 8B Groq $0.05 $0.08 Ultra Fast 128K
Mixtral 8x7B Groq $0.27 $0.27 Very Fast 32K
Claude Haiku Anthropic $1.00 $5.00 Fast 200K

Image Generation

Model Provider Resolution Cost Speed
Imagen 3 Google 1024×1024 $0.030 ~8s
FLUX.1 [pro] Replicate 1024×1024 $0.055 ~10s
FLUX.1 [dev] Replicate 1024×1024 $0.030 ~6s
FLUX.1 [schnell] Replicate 1024×1024 $0.003 ~2s

Sources


End of Document

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment