LLM cost tracking in Go: a per-model price model on top of token budgets

Turn prompt_tokens / completion_tokens into dollar spend per model, per pipeline, per day — in about 30 lines of Go, no SaaS metering.

Last tested: June 2026. See Changelog at the bottom.

If this saves you setup time, follow @renezander030 — production notes on shipping AI agents outside demos.

Reference implementation (Go, MIT): github.com/renezander030/draftcat

TL;DR cheat sheet

Goal	Do this
Dollar cost per call	`cost = prompt/1000cost_in + completion/1000cost_out` from the response `usage` block
Price table per model	`cost_per_1k_input` / `cost_per_1k_output` in config, keyed by model name
Hard ceiling that can't be breached	pre-flight token check, fail closed (`BUDGET_BLOCKED`) before the call
Why not a dollar `MaxCost` cap?	output tokens are unknown pre-flight — cap tokens, report dollars
Unknown model in the price table	hard stop, never silent zero-cost

The cost-tracking rule of thumb

Three things, in order:

Enforce on tokens, report in dollars. Token counts are knowable before the call (an estimate) and exactly after (usage); dollar cost is derived from them. Gate on tokens, surface dollars.
Keep the price table in config, not code. Per-model prices drift; recompiling to change a price is a smell.
Fail closed. An unpriced model or a breached budget should stop the call, not log a warning and continue.

Recommended setup

A per-model price table in config plus a cost derived from the response usage. Do this before reaching for a dashboard or third-party metering: it is about 30 lines and has zero runtime dependency.

1. Price table per model (steal-able config)

models:
  - name: anthropic/claude-opus-4
    cost_per_1k_input: 0.015
    cost_per_1k_output: 0.075
  - name: anthropic/claude-haiku-4
    cost_per_1k_input: 0.0008
    cost_per_1k_output: 0.004

OpenRouter and OpenAI-style responses both return the same usage shape; only the prices differ.

2. Dollar cost from token usage (Go)

The provider already hands you the token counts. Cost is one expression:

// usage block from the OpenAI / OpenRouter-style response
type Usage struct {
    PromptTokens     int `json:"prompt_tokens"`
    CompletionTokens int `json:"completion_tokens"`
}

// modelCfg.CostIn / CostOut are the cost_per_1k_input / _output from the YAML above
cost := float64(u.PromptTokens)/1000*modelCfg.CostIn +
        float64(u.CompletionTokens)/1000*modelCfg.CostOut

Persist cost, PromptTokens, CompletionTokens, and model per call. That row is your ledger; the daily total is a SUM.

3. The budget that actually stops the call (pre-flight, fail closed)

A dollar cap cannot be enforced strictly before a call — you do not know the output token count yet. So gate on a token budget pre-flight, and keep dollars for the ledger and the daily rollup:

func (b *BudgetTracker) check(limit, requested int) error {
    b.resetIfNewDay()
    if b.tokensUsedToday+requested > limit {
        return fmt.Errorf("BUDGET_BLOCKED: daily token limit %d would be exceeded (used: %d, requested: %d)",
            limit, b.tokensUsedToday, requested)
    }
    return nil
}

requested is a max_output estimate; reconcile with the exact count after the call. Run the same check at three scopes: per step, per pipeline run, per day.

Why token caps instead of a dollar `MaxCost`

Output tokens are unknown pre-flight. A hard dollar ceiling before the call is impossible without bounding max_output anyway, so bound the tokens directly.
Token caps are provider-agnostic. They do not move when a vendor changes prices; your dollar report updates from the config table, the enforcement stays put.
Dollars are the human-facing number. Enforce on the stable quantity (tokens), report the one people care about (USD).

Common failure: `cost` is always 0

cost: 0 almost always means the model was not found in your price table, so cost_in / cost_out defaulted to zero. Treat a missing entry as a hard error, not a silent zero:

mc, ok := priceTable[modelName]
if !ok {
    return fmt.Errorf("no price entry for model %q — refusing to run un-costed", modelName)
}

Smoke test

$ draftcat test triage-email
[ai] model=anthropic/claude-haiku-4 in=812 out=143 cost=$0.00122
[budget] day: 955/200000 tokens, $0.0012

Pass criteria: a non-zero cost=$... on every AI step, and the day total incrementing across runs.

Series

This is Production AI Automation Notes #9. The series, on shipping AI agents outside demos:

Agent Approval Gates
Token Budgets — the token half of this post
Agentic Task System
Driving CapCut / JianYing from an LLM agent
SQLite dedup + crash safety
Prompt-injection defense
PDF cite verification
Stateless JSONL queue runner
LLM cost tracking (this entry)

Reference implementation for the Go entries: draftcat (MIT).

Follow @renezander030 for new entries.

Sources

The token-budget half of this pattern: Production AI Automation Notes #2.
The same gap came up in a go-llm cost-tracking thread — no cost tracking or budget enforcement, same fix.

Reader contributions

Tracking spend differently? Comment with your provider, how you key the price table (model name vs provider+model), and whether you enforce on tokens or dollars.

Changelog

2026-06-12

Initial entry. Skipped the hardware-by-model matrix gate (not hardware-bound) and a new companion repo (draftcat already carries the runnable code).

renezander030/README.md

Select an option

No results found

Select an option

No results found

LLM cost tracking in Go: a per-model price model on top of token budgets

TL;DR cheat sheet

The cost-tracking rule of thumb

Recommended setup

1. Price table per model (steal-able config)

2. Dollar cost from token usage (Go)

3. The budget that actually stops the call (pre-flight, fail closed)

Why token caps instead of a dollar `MaxCost`

Common failure: `cost` is always 0

Smoke test

Series

Sources

Reader contributions

Changelog

2026-06-12

renezander030/README.md

LLM cost tracking in Go: a per-model price model on top of token budgets

TL;DR cheat sheet

The cost-tracking rule of thumb

Recommended setup

1. Price table per model (steal-able config)

2. Dollar cost from token usage (Go)

3. The budget that actually stops the call (pre-flight, fail closed)

Why token caps instead of a dollar MaxCost

Common failure: cost is always 0

Smoke test

Series

Sources

Reader contributions

Changelog

2026-06-12

Why token caps instead of a dollar `MaxCost`

Common failure: `cost` is always 0