Codeburn - token cost TUI for Claude Code & Codex
Interactive terminal dashboard that breaks down where your AI coding tokens actually go. Surfaces the chat-vs-tool-use split most users get wrong.
If you've ever stared at an Anthropic invoice and wondered why your agent burned through a week's budget refactoring a single React component, Codeburn is the diagnostic. It is a local TUI that tells you exactly where your AI coding tokens went, broken down by tool, model, project, and task type.
The thing the dashboard surfaces - and the thing that makes most users do a double-take - is the conversation-vs-coding split. One user's published breakdown was 56% pure conversation and only 20% actual coding output. You don't realise how much of an agent run is the model talking to itself until you measure it.
How it gathers data
No proxy, no wrapper, no API keys. Codeburn reads session transcripts directly off disk and prices each call against the LiteLLM rate sheet. The on-disk locations it knows about cover 16 tools, including:
- Claude Code -
~/.claude/projects/ - Claude Desktop -
~/Library/Application Support/Claude/local-agent-mode-sessions/ - Codex -
~/.codex/sessions/ - Cursor - SQLite at
~/Library/Application Support/Cursor/User/globalStorage/state.vscdb - Gemini CLI -
~/.gemini/tmp/<project>/chats/ - GitHub Copilot -
~/.copilot/session-state/plus VS Code workspace storage - OpenCode - SQLite at
~/.local/share/opencode/ - Plus Roo Code, KiloCode, Droid, Qwen, Pi, OMP, OpenClaw, Kiro
Because everything is local, the privacy story is straightforward: Codeburn never sees the data the cloud doesn't already have, and you can run it offline.
Quick start
npm install -g codeburn
npx codeburn
The default view is the last 7 days as an interactive dashboard. The single-purpose subcommands are useful for scripting:
codeburn today # today only
codeburn month # current month
codeburn report -p 30days
codeburn status # summary line, e.g. for shell prompts
codeburn export # CSV / JSON dump
codeburn optimize # surfaces wasted spend
codeburn compare # cost across models for the same workload
codeburn yield # cost per shipped change
In the TUI, arrow keys rotate between Today / 7 / 30 / Month / All Time, 1–5 are shortcuts for those, c opens the model comparison, o opens the optimize report, p toggles providers, and q quits. The dashboard auto-refreshes every 30 seconds; tune it with --refresh.
What you actually get from it
Codeburn classifies every call into one of 13 categories - Coding, Debugging, Feature Dev, Refactoring, Testing, Exploration, Planning, Delegation, Git Ops, Build/Deploy, Brainstorming, Conversation, and General. Three of those carry most of the signal:
- Conversation - back-and-forth that doesn't change code. Anything north of ~30% is a yellow flag.
- Exploration - the agent re-reading files. High exploration cost is a strong argument for plugging in something like a code-graph MCP.
- One-shot rate - percent of edits that landed without a retry. Low one-shot rate means the model is guessing.
Per-project and per-model breakdowns are the part you'll come back to. They're the difference between "Claude Code is expensive" and "Claude Code is expensive on this one repo because the AGENTS.md is bad."
When to reach for it
- Before changing models or providers - you want a baseline so the comparison means something.
- When monthly spend has crept up and you can't point at a cause.
- If you're building tooling on top of agents and need to attribute cost to features.
When not to
- For real-time enforcement. Codeburn is a read-only diagnostic - it doesn't gate or rate-limit. Pair it with a proxy if you need policy enforcement.
- For tools that don't persist transcripts to disk. If you only use a web-based agent, there's nothing for Codeburn to read.
Known accuracy caveats
A few of the tools obscure their numbers, and Codeburn estimates rather than guesses:
- Cursor - when "Auto" mode hides the actual model, costs are estimated at Sonnet rates. First run on a large Cursor SQLite database can take up to a minute.
- GitHub Copilot in VS Code - no explicit token counts in the format; estimated from content length.
- Kiro - same story, costed at Sonnet rates.
Treat the absolute numbers as ±10–20% on those tools. The relative breakdown - which projects, which task types, which days - is still the part that matters and is unaffected.
Requires Node.js 20+ and at least one tool that writes session data to disk.
Recent discussion
From the wider webFeatured in
Claude Code tools, plugins, and integrations
The best tools, MCP servers, and harnesses for getting more out of Claude Code - orchestration, observability, telemetry, and remote control.
Observability for AI coding agents
Tools that show you what your coding agents are actually doing: token spend, session state, tool calls, and parallel execution.
Tools for OpenAI Codex CLI
The Codex-aware slice of the directory: orchestration, observability, sandboxes, and bridges built specifically for the OpenAI Codex runtime.
Related entries
snip - LLM token-saving proxy in Go
CLI proxy for Claude Code, Cursor, Copilot, and Gemini that strips noise from LLM context with declarative YAML filters. Reports 60-90% token savings on typical agent traffic.
jeeves - TUI for browsing AI agent sessions
Terminal UI to search, preview, read, and resume Claude Code and Codex sessions in a unified view. More framework integrations planned.
coding_agent_session_search - 11-provider session search
Rust TUI and CLI that indexes and searches local coding-agent session history across Codex, Claude Code, Gemini, Cursor, Aider and seven other providers.
Recall - TUI search across agent session history
Local-first Rust TUI that searches Claude Code, Codex, and OpenCode session history with hybrid full-text plus semantic retrieval. Built on ratatui.