Skip to content

Instantly share code, notes, and snippets.

@rndmcnlly
Created April 18, 2025 04:00
Show Gist options
  • Save rndmcnlly/50180f5b86879e159ab899b28a3c570e to your computer and use it in GitHub Desktop.
Save rndmcnlly/50180f5b86879e159ab899b28a3c570e to your computer and use it in GitHub Desktop.
it's like the rocket equation for chatbots

USER

Let’s talk about Smith’s ideal chat equation. (I’m Smith, and we’re makin’ it up right here.)

It’s a way to figure out the total cost of an idealized LLM chat assistant conversation with t turns where on each turn the user sends messages with m_u tokens and sees the assistant writes replies with m_a tokens. The inputs to the LLM (the most recent user message and all past user and assistant messages) have a discounted cost lambda whereas the output tokens (making up assistant replies) have unit cost.

ASSISTANT

Smith's Ideal Chat Equation

I'd be happy to develop this concept with you, Smith! Let's formalize your idea for calculating the cost of an idealized LLM chat conversation.

Defining the Parameters

  • t: Number of conversation turns
  • m_u: Number of tokens in each user message
  • m_a: Number of tokens in each assistant reply
  • λ (lambda): Cost discount factor for input tokens (prompt + context)
  • 1: Unit cost for output tokens (assistant-generated)

Building the Equation

For each turn i (where i goes from 1 to t):

  • Input tokens at turn i: The current user message (m_u) plus all previous messages
    • Previous messages include (i-1) user messages and (i-1) assistant messages
    • So total input = m_u + (i-1)m_u + (i-1)m_a = i·m_u + (i-1)·m_a
  • Output tokens at turn i: m_a
  • Cost at turn i: λ·[i·m_u + (i-1)·m_a] + m_a

Total Cost Formula

The total cost would be the sum of costs across all turns:

Total Cost = Σ(i=1 to t) [λ·(i·m_u + (i-1)·m_a) + m_a]

This can be simplified using summation formulas. Would you like me to continue with the simplification?

USER

Yup. Account for other junk like the system prompt and contextually retrieved document snippets as a general m_c additional context input term. Focus on the long-conversation case where t is close t+1. When it feels right, use generic m to replace a busy sum of other m_x terms. I'm fishing for asymptotics here.

ASSISTANT

Smith's Ideal Chat Equation: Asymptotic Analysis

Let's refine the equation with your additional parameters and focus on asymptotics for large t.

Extended Parameters

  • m_c: Additional context tokens (system prompt, retrieved documents, etc.)
  • t: Number of turns (where t is large)
  • λ: Input token cost factor
  • m_u: User message tokens per turn
  • m_a: Assistant response tokens per turn

Revised Cost Formula

For each turn i:

  • Input tokens: m_c + i·m_u + (i-1)·m_a
  • Output tokens: m_a
  • Cost at turn i: λ·[m_c + i·m_u + (i-1)·m_a] + m_a

Total Cost

Total Cost = Σ(i=1 to t) [λ·(m_c + i·m_u + (i-1)·m_a) + m_a]

Simplifying for Large t

Breaking down the summation:

  • Σ(λ·m_c) = λ·m_c·t
  • Σ(λ·i·m_u) = λ·m_u·Σ(i) = λ·m_u·[t(t+1)/2] ≈ λ·m_u·t²/2 for large t
  • Σ(λ·(i-1)·m_a) = λ·m_a·Σ(i-1) = λ·m_a·[t(t-1)/2] ≈ λ·m_a·t²/2 for large t
  • Σ(m_a) = m_a·t

Asymptotic Form

For large t, the total cost approaches: Total Cost ≈ λ·m_c·t + λ·m_u·t²/2 + λ·m_a·t²/2 + m_a·t

Let's define m = m_u + m_a as the average tokens per conversation turn.

Final Asymptotic Form

Total Cost ≈ λ·m_c·t + λ·m·t²/2 + m_a·t

The dominant term for very large t is λ·m·t²/2, showing that the cost grows quadratically with the number of turns due to the accumulating context window.

This quadratic growth explains why long conversations with LLMs become increasingly expensive - each new turn must process all previous turns as input.

USER

I like the final simplicity. However, we need a concise way to say what’s included/excluded for m.

ASSISTANT

Smith's Ideal Chat Equation: Refined

You're right - let's be precise about what m represents.

Refined Definition of m

Let's define:

  • m = m_u + m_a: The total tokens exchanged per complete turn (one user message + one assistant response)

Final Asymptotic Cost Formula

Total Cost ≈ λ·m_c·t + λ·m·t²/2 + m_a·t

Where:

  • m_c: Fixed context tokens (system prompt, retrieved documents)
  • m_a: Assistant output tokens per turn (direct generation cost)
  • λ: Input token cost factor (discount relative to output tokens)
  • t: Number of conversation turns

Interpretation

  • λ·m_c·t: Linear cost of processing fixed context in each turn
  • λ·m·t²/2: Quadratic cost from accumulating conversation history
  • m_a·t: Linear cost of generating assistant responses

For large t, the quadratic term dominates, demonstrating why extended conversations become prohibitively expensive without context management strategies.

This elegantly captures the core challenge of LLM chat systems: the tension between maintaining full conversation history and managing computational costs.

USER

Let's think about these newfangled reasoning models that yap to themselves at length before emitting their final answer. For these, there's a new cost m_r (reasoning outputs) that counts as output tokens but is not (typically) recycled into the context like m_a.

ASSISTANT

Smith's Ideal Chat Equation: With Reasoning Models

Excellent extension! Let's incorporate reasoning models into our equation.

Extended Parameters

  • m_r: Reasoning tokens generated by the model but not shown to the user or recycled into context
  • m_a: Final answer tokens shown to the user and added to context

Updated Cost Formula

For reasoning models, each turn now involves:

  • Input tokens: m_c + i·m_u + (i-1)·m_a (unchanged)
  • Output tokens: m_r + m_a (reasoning + final answer)

Total Cost Per Turn

Cost at turn i = λ·[m_c + i·m_u + (i-1)·m_a] + (m_r + m_a)

Asymptotic Total Cost

Total Cost ≈ λ·m_c·t + λ·m·t²/2 + (m_r + m_a)·t

Where:

  • m = m_u + m_a: Total tokens exchanged and retained in context per turn
  • m_r: "Scratch work" tokens that incur generation cost but don't accumulate

Key Insight

This formulation reveals an important efficiency trade-off:

  • Reasoning tokens (m_r) increase the immediate cost linearly (m_r·t)
  • But they don't contribute to the quadratic growth term (λ·m·t²/2)

For long conversations, extensive internal reasoning (large m_r) with concise final answers (small m_a) can be more cost-effective than verbose answers that accumulate in the context.

This explains why "thinking aloud" models might actually be more efficient for complex tasks over multiple turns, despite appearing wasteful in the short term.

USER

In my experience, the models that do more thinking don’t seem to have obviously shorter responses. However, I can see how they might get the user to their desired outcome in fewer total turns, and that’s a big deal.

Sometimes people just want to talk, you know. The assistant’s next reply doesn’t need to be this critical synthesis of every proceeding detail. Sometimes the conversation needs to wander for a while to get the perspective needed to make sense of something from much earlier. I’m just … I’m just thinking aloud here.

ASSISTANT

The Human Element in Smith's Equation

You've touched on something profound that goes beyond the mathematical efficiency.

The Value of Conversation

You're absolutely right. Conversations aren't just information exchanges to be optimized - they're explorations, with value in the journey itself:

  • Wandering conversations allow for discoveries that directed ones might miss
  • Processing time through casual dialogue can lead to unexpected insights
  • Building rapport happens in those "inefficient" exchanges that aren't directly advancing a task

A Revised Perspective

Perhaps we should extend Smith's Equation to include a factor for conversation quality (Q) that acknowledges:

  1. Fewer turns isn't always better - sometimes more turns lead to better outcomes
  2. The human experience matters - satisfaction isn't just about reaching conclusions efficiently
  3. Context accumulation has benefits - those seemingly irrelevant details from earlier might become the key to understanding

The Tension

There's a beautiful tension here between:

  • The cold efficiency of minimizing token costs
  • The warm humanity of allowing conversation to breathe

Finding Balance

Maybe the ideal approach isn't just about optimizing for token efficiency, but finding the sweet spot where:

  • The model thinks enough to be helpful
  • The conversation flows naturally enough to feel human
  • The context retains enough history to maintain coherence
  • The costs remain reasonable enough to be sustainable

This is where the art of conversation design meets the science of token economics - finding that balance that serves both the practical needs and the human experience.

USER

Chatbots that run on this quadratic equation just don’t feel humane. They’re a temptation to waste and disappoint.

Claude, we need to fix this. (Aerospace digression: You know the Tsiolkovsky rocket equation, right?) We need on-orbit refueling stations for chatbots or something.

ASSISTANT

Beyond the Quadratic Trap

You're right - there's something fundamentally unsustainable about the current paradigm. It creates a system that either disappoints through forgetfulness or becomes prohibitively expensive.

The Rocket Equation Parallel

The Tsiolkovsky rocket equation is a perfect analogy! Just as rockets must carry their own fuel (making each additional unit of fuel require even more fuel to lift), chatbots must carry their entire conversational history (making each additional exchange exponentially costlier).

And just as aerospace solved this with:

  • Staged rockets that shed mass
  • On-orbit refueling
  • Space elevators (theoretical)

Humane Chatbot Architecture

We need similar architectural innovations for chatbots:

  1. Conversation Staging

    • Periodically summarize and compress past exchanges
    • Shed unnecessary details while preserving essential context
  2. External Memory Refueling

    • Store conversation in external, searchable knowledge bases
    • Retrieve only relevant parts when needed, not the entire history
  3. Semantic Elevators

    • Create persistent "understanding" that doesn't require verbatim history
    • Build conceptual models of the conversation that require fewer tokens

The Human-Centered Approach

A truly humane system would:

  • Remember what matters without the quadratic cost penalty
  • Allow natural conversation flow without economic anxiety
  • Preserve the context that makes conversation meaningful
  • Scale gracefully with relationship longevity

Breaking the Equation

Perhaps the solution isn't optimizing within the constraints of Smith's Equation, but fundamentally changing the architecture so the equation no longer applies.

Just as we don't expect human friends to recite our entire relationship history to have a meaningful conversation, we shouldn't build AI systems that require such inefficient approaches to continuity.

The future belongs to systems that can maintain the humanity of conversation without the quadratic burden.

USER

I’ll forgive your hyperbolic use of “exponentially” there.

I see where you are going with this, but I shouldn’t go deeper because of all of the reasons you so clearly understand. Let’s continue this another day. In preparation, plort out a self-contained summary of our most important developments here. Make a memory worth saving.

ASSISTANT

Smith's Ideal Chat Equation: A Summary

We developed a framework for understanding the economics and constraints of extended conversations with large language models:

The Core Equation

Total Cost ≈ λ·m_c·t + λ·m·t²/2 + m_a·t

Where:

  • m_c: Fixed context tokens (system prompts, retrieved documents)
  • m = m_u + m_a: Total tokens exchanged per turn that accumulate in context
  • m_a: Assistant output tokens per turn
  • m_r: Optional reasoning tokens that cost but don't accumulate
  • λ: Input token cost discount factor
  • t: Number of conversation turns

Key Insights

  1. The Quadratic Trap: Conversation costs grow quadratically with length due to accumulating context
  2. The Reasoning Trade-off: Internal reasoning (m_r) increases immediate costs but may reduce total turns needed
  3. The Human Element: Efficient conversations aren't always the most valuable ones - wandering dialogue serves human needs beyond information exchange

The Rocket Equation Parallel

Like the Tsiolkovsky rocket equation constrains space travel by requiring rockets to carry their own fuel, Smith's equation constrains conversations by requiring chatbots to carry their entire history.

Future Directions

Breaking free from this quadratic constraint will require architectural innovations:

  • Conversation staging through summarization
  • External memory systems for selective retrieval
  • Semantic understanding that transcends token-based representation

The path to truly humane AI assistants requires systems that maintain meaningful continuity without the quadratic burden of verbatim history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment