Skip to content

Instantly share code, notes, and snippets.

@renezander030
Created June 12, 2026 15:45
Show Gist options
  • Select an option

  • Save renezander030/2de8a7c169d0eac0d857f52d2cda061e to your computer and use it in GitHub Desktop.

Select an option

Save renezander030/2de8a7c169d0eac0d857f52d2cda061e to your computer and use it in GitHub Desktop.
LLM cost tracking in Go: a per-model price model on top of token budgets

LLM cost tracking in Go: a per-model price model on top of token budgets

Turn prompt_tokens / completion_tokens into dollar spend per model, per pipeline, per day — in about 30 lines of Go, no SaaS metering.

Last tested: June 2026. See Changelog at the bottom.

If this saves you setup time, follow @renezander030 — production notes on shipping AI agents outside demos.

Reference implementation (Go, MIT): github.com/renezander030/draftcat

TL;DR cheat sheet

Goal Do this
Dollar cost per call cost = prompt/1000*cost_in + completion/1000*cost_out from the response usage block
Price table per model cost_per_1k_input / cost_per_1k_output in config, keyed by model name
Hard ceiling that can't be breached pre-flight token check, fail closed (BUDGET_BLOCKED) before the call
Why not a dollar MaxCost cap? output tokens are unknown pre-flight — cap tokens, report dollars
Unknown model in the price table hard stop, never silent zero-cost

The cost-tracking rule of thumb

Three things, in order:

  1. Enforce on tokens, report in dollars. Token counts are knowable before the call (an estimate) and exactly after (usage); dollar cost is derived from them. Gate on tokens, surface dollars.
  2. Keep the price table in config, not code. Per-model prices drift; recompiling to change a price is a smell.
  3. Fail closed. An unpriced model or a breached budget should stop the call, not log a warning and continue.

Recommended setup

A per-model price table in config plus a cost derived from the response usage. Do this before reaching for a dashboard or third-party metering: it is about 30 lines and has zero runtime dependency.

1. Price table per model (steal-able config)

models:
  - name: anthropic/claude-opus-4
    cost_per_1k_input: 0.015
    cost_per_1k_output: 0.075
  - name: anthropic/claude-haiku-4
    cost_per_1k_input: 0.0008
    cost_per_1k_output: 0.004

OpenRouter and OpenAI-style responses both return the same usage shape; only the prices differ.

2. Dollar cost from token usage (Go)

The provider already hands you the token counts. Cost is one expression:

// usage block from the OpenAI / OpenRouter-style response
type Usage struct {
    PromptTokens     int `json:"prompt_tokens"`
    CompletionTokens int `json:"completion_tokens"`
}

// modelCfg.CostIn / CostOut are the cost_per_1k_input / _output from the YAML above
cost := float64(u.PromptTokens)/1000*modelCfg.CostIn +
        float64(u.CompletionTokens)/1000*modelCfg.CostOut

Persist cost, PromptTokens, CompletionTokens, and model per call. That row is your ledger; the daily total is a SUM.

3. The budget that actually stops the call (pre-flight, fail closed)

A dollar cap cannot be enforced strictly before a call — you do not know the output token count yet. So gate on a token budget pre-flight, and keep dollars for the ledger and the daily rollup:

func (b *BudgetTracker) check(limit, requested int) error {
    b.resetIfNewDay()
    if b.tokensUsedToday+requested > limit {
        return fmt.Errorf("BUDGET_BLOCKED: daily token limit %d would be exceeded (used: %d, requested: %d)",
            limit, b.tokensUsedToday, requested)
    }
    return nil
}

requested is a max_output estimate; reconcile with the exact count after the call. Run the same check at three scopes: per step, per pipeline run, per day.

Why token caps instead of a dollar MaxCost

  • Output tokens are unknown pre-flight. A hard dollar ceiling before the call is impossible without bounding max_output anyway, so bound the tokens directly.
  • Token caps are provider-agnostic. They do not move when a vendor changes prices; your dollar report updates from the config table, the enforcement stays put.
  • Dollars are the human-facing number. Enforce on the stable quantity (tokens), report the one people care about (USD).

Common failure: cost is always 0

cost: 0 almost always means the model was not found in your price table, so cost_in / cost_out defaulted to zero. Treat a missing entry as a hard error, not a silent zero:

mc, ok := priceTable[modelName]
if !ok {
    return fmt.Errorf("no price entry for model %q — refusing to run un-costed", modelName)
}

Smoke test

$ draftcat test triage-email
[ai] model=anthropic/claude-haiku-4 in=812 out=143 cost=$0.00122
[budget] day: 955/200000 tokens, $0.0012

Pass criteria: a non-zero cost=$... on every AI step, and the day total incrementing across runs.


Series

This is Production AI Automation Notes #9. The series, on shipping AI agents outside demos:

  1. Agent Approval Gates
  2. Token Budgets — the token half of this post
  3. Agentic Task System
  4. Driving CapCut / JianYing from an LLM agent
  5. SQLite dedup + crash safety
  6. Prompt-injection defense
  7. PDF cite verification
  8. Stateless JSONL queue runner
  9. LLM cost tracking (this entry)

Reference implementation for the Go entries: draftcat (MIT).

Follow @renezander030 for new entries.

Sources

Reader contributions

Tracking spend differently? Comment with your provider, how you key the price table (model name vs provider+model), and whether you enforce on tokens or dollars.

Changelog

2026-06-12

  • Initial entry. Skipped the hardware-by-model matrix gate (not hardware-bound) and a new companion repo (draftcat already carries the runnable code).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment