Turn prompt_tokens / completion_tokens into dollar spend per model, per pipeline, per day — in about 30 lines of Go, no SaaS metering.
Last tested: June 2026. See Changelog at the bottom.
If this saves you setup time, follow @renezander030 — production notes on shipping AI agents outside demos.
Reference implementation (Go, MIT): github.com/renezander030/draftcat
| Goal | Do this |
|---|---|
| Dollar cost per call | cost = prompt/1000*cost_in + completion/1000*cost_out from the response usage block |
| Price table per model | cost_per_1k_input / cost_per_1k_output in config, keyed by model name |
| Hard ceiling that can't be breached | pre-flight token check, fail closed (BUDGET_BLOCKED) before the call |
Why not a dollar MaxCost cap? |
output tokens are unknown pre-flight — cap tokens, report dollars |
| Unknown model in the price table | hard stop, never silent zero-cost |
Three things, in order:
- Enforce on tokens, report in dollars. Token counts are knowable before the call (an estimate) and exactly after (
usage); dollar cost is derived from them. Gate on tokens, surface dollars. - Keep the price table in config, not code. Per-model prices drift; recompiling to change a price is a smell.
- Fail closed. An unpriced model or a breached budget should stop the call, not log a warning and continue.
A per-model price table in config plus a cost derived from the response usage. Do this before reaching for a dashboard or third-party metering: it is about 30 lines and has zero runtime dependency.
models:
- name: anthropic/claude-opus-4
cost_per_1k_input: 0.015
cost_per_1k_output: 0.075
- name: anthropic/claude-haiku-4
cost_per_1k_input: 0.0008
cost_per_1k_output: 0.004OpenRouter and OpenAI-style responses both return the same usage shape; only the prices differ.
The provider already hands you the token counts. Cost is one expression:
// usage block from the OpenAI / OpenRouter-style response
type Usage struct {
PromptTokens int `json:"prompt_tokens"`
CompletionTokens int `json:"completion_tokens"`
}
// modelCfg.CostIn / CostOut are the cost_per_1k_input / _output from the YAML above
cost := float64(u.PromptTokens)/1000*modelCfg.CostIn +
float64(u.CompletionTokens)/1000*modelCfg.CostOutPersist cost, PromptTokens, CompletionTokens, and model per call. That row is your ledger; the daily total is a SUM.
A dollar cap cannot be enforced strictly before a call — you do not know the output token count yet. So gate on a token budget pre-flight, and keep dollars for the ledger and the daily rollup:
func (b *BudgetTracker) check(limit, requested int) error {
b.resetIfNewDay()
if b.tokensUsedToday+requested > limit {
return fmt.Errorf("BUDGET_BLOCKED: daily token limit %d would be exceeded (used: %d, requested: %d)",
limit, b.tokensUsedToday, requested)
}
return nil
}requested is a max_output estimate; reconcile with the exact count after the call. Run the same check at three scopes: per step, per pipeline run, per day.
- Output tokens are unknown pre-flight. A hard dollar ceiling before the call is impossible without bounding
max_outputanyway, so bound the tokens directly. - Token caps are provider-agnostic. They do not move when a vendor changes prices; your dollar report updates from the config table, the enforcement stays put.
- Dollars are the human-facing number. Enforce on the stable quantity (tokens), report the one people care about (USD).
cost: 0 almost always means the model was not found in your price table, so cost_in / cost_out defaulted to zero. Treat a missing entry as a hard error, not a silent zero:
mc, ok := priceTable[modelName]
if !ok {
return fmt.Errorf("no price entry for model %q — refusing to run un-costed", modelName)
}$ draftcat test triage-email
[ai] model=anthropic/claude-haiku-4 in=812 out=143 cost=$0.00122
[budget] day: 955/200000 tokens, $0.0012
Pass criteria: a non-zero cost=$... on every AI step, and the day total incrementing across runs.
This is Production AI Automation Notes #9. The series, on shipping AI agents outside demos:
- Agent Approval Gates
- Token Budgets — the token half of this post
- Agentic Task System
- Driving CapCut / JianYing from an LLM agent
- SQLite dedup + crash safety
- Prompt-injection defense
- PDF cite verification
- Stateless JSONL queue runner
- LLM cost tracking (this entry)
Reference implementation for the Go entries: draftcat (MIT).
Follow @renezander030 for new entries.
- The token-budget half of this pattern: Production AI Automation Notes #2.
- The same gap came up in a go-llm cost-tracking thread — no cost tracking or budget enforcement, same fix.
Tracking spend differently? Comment with your provider, how you key the price table (model name vs provider+model), and whether you enforce on tokens or dollars.
- Initial entry. Skipped the hardware-by-model matrix gate (not hardware-bound) and a new companion repo (draftcat already carries the runnable code).