caveman - Claude Code skill that talks like caveman
Claude Code skill that rewrites prompts in caveman speech to cut 65% of tokens with no measurable quality loss. Joke premise, real savings.
A joke premise, a serious result. Caveman is a Claude Code skill (and Codex/Gemini/Cursor/Windsurf/Cline/Copilot plugin) that rewrites the agent's prompts in caveman-speech and cuts roughly 65% of output tokens with no measurable accuracy loss. The README's tagline ("why use many token when few token do trick") is also a fair summary of the engineering claim.
Two reasons it earns the 9k+ stars: the savings are reproducible from the project's own eval harness, and the implementation is one-command install for nearly every major coding agent. The caveman thing is a Trojan horse for terse-prompt research that would be ignored if it shipped under a less fun name.
What it does, in two examples
Normal Claude (69 tokens):
"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I'd recommend using useMemo to memoize the object."
Caveman Claude (19 tokens):
"New object ref each render. Inline object prop = new ref = re-render. Wrap in
useMemo."
Same fix. ~75% fewer tokens. Brain still big.
Intensity levels
Pick your level of grunt:
- Lite (
/caveman lite) - drop filler, keep grammar. Professional, no fluff. - Full (
/caveman full) - default caveman. Drop articles, fragments, full grunt. - Ultra (
/caveman ultra) - maximum compression. Telegraphic. Abbreviate everything. - Wenyan (
/caveman wenyan,wenyan-lite,wenyan-ultra) - classical Chinese literary compression. Same accuracy in arguably the most token-efficient written language ever.
Levels stick until you change them or the session ends.
Install
One command per agent:
| Agent | Install |
|---|---|
| Claude Code | claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman |
| Codex | Clone repo → /plugins → search "Caveman" → Install |
| Gemini CLI | gemini extensions install https://github.com/JuliusBrussee/caveman |
| Cursor | npx skills add JuliusBrussee/caveman -a cursor |
| Windsurf | npx skills add JuliusBrussee/caveman -a windsurf |
| Copilot / Cline / others | npx skills add JuliusBrussee/caveman |
For Claude Code and Gemini, auto-activation happens via SessionStart hooks and context files - install once, get caveman in every future session. For the others, npx skills add installs the skill but not the auto-activation snippet, so you trigger with /caveman, $caveman, or "talk like caveman" each session (or paste the always-on snippet from the README into your system prompt).
The benchmarks
The README publishes its own eval harness output. Real token counts from the Claude API:
| Task | Normal | Caveman | Saved |
|---|---|---|---|
| Explain React re-render bug | 1180 | 159 | 87% |
| Fix auth middleware token expiry | 704 | 121 | 83% |
| Set up PostgreSQL connection pool | 2347 | 380 | 84% |
| Explain git rebase vs merge | 702 | 292 | 58% |
| Refactor callback to async/await | 387 | 301 | 22% |
| Architecture: microservices vs monolith | 446 | 310 | 30% |
| Review PR for security issues | 678 | 398 | 41% |
| Docker multi-stage build | 1042 | 290 | 72% |
| Debug PostgreSQL race condition | 1200 | 232 | 81% |
| Implement React error boundary | 3454 | 456 | 87% |
| Average | 1214 | 294 | 65% |
Range is 22%–87%, depending on how prose-heavy the response naturally is. Architecture explanations compress less; bug explanations compress more.
The important caveat the README is honest about: caveman only affects output tokens. Thinking/reasoning tokens are untouched. The biggest practical win is readability and response speed; the cost savings are a bonus.
The non-obvious feature: caveman-compress
/caveman makes the agent speak with fewer tokens. caveman-compress makes it read fewer tokens. It rewrites your CLAUDE.md (and any other context files Claude loads every session start) into caveman-speak, so the agent's input is smaller every time it boots.
/caveman:compress CLAUDE.md
After running:
CLAUDE.md ← compressed (Claude reads this every session, fewer tokens)
CLAUDE.original.md ← human-readable backup (you read and edit this)
The README's measured savings on real CLAUDE.md-style files:
| File | Original | Compressed | Saved |
|---|---|---|---|
claude-md-preferences.md | 706 | 285 | 59.6% |
project-notes.md | 1145 | 535 | 53.3% |
claude-md-project.md | 1122 | 636 | 43.3% |
todo-list.md | 627 | 388 | 38.1% |
| Average | 898 | 481 | 46% |
Code blocks, URLs, file paths, commands, headings, dates, and version numbers pass through untouched. Only prose gets compressed.
(Security note from the upstream: Snyk flags caveman-compress as High Risk due to subprocess/file patterns. It's a false positive - the project's SECURITY.md explains why.)
Other skills shipped in the same plugin
- caveman-commit - terse commit messages. Conventional Commits, ≤50 char subject, why-over-what.
- caveman-review - one-line PR comments.
L42: 🔴 bug: user null. Add guard.No throat-clearing. - caveman-help - quick-reference card; lists all modes, skills, commands.
When to reach for it
- Long agent sessions where output volume is a real cost driver.
- Codebases with bloated
CLAUDE.md/ context files where session-start tokens are silently hurting you. - Anyone who finds verbose AI explanations slower to read than to skim.
When not to
- Tasks where the prose itself is the deliverable (writing docs, drafting emails, customer-facing copy).
- Audiences who'll find caveman speech unprofessional. The Lite intensity is the right starting point if you're not sure.
Why this works (the boring research version)
There's a March 2026 paper - "Brevity Constraints Reverse Performance Hierarchies in Language Models" - that found constraining large models to brief responses improved accuracy by 26 percentage points on certain benchmarks. Verbose isn't always better. Sometimes fewer words means more correct.
Caveman is the practical, opinionated, fun-named expression of that result. The eval harness lives in evals/ if you want to verify the numbers yourself.
Recent discussion
From the wider webProtect Your API Keys: Evaluating AI Tools Like Bifrost and Caveman
dev.to · Apr 30, 2026
I benchmarked Claude Code's caveman plugin against "be brief."
maxtaylor.me · Apr 30, 2026
I Ran Claude Code in Caveman Mode for 48 Hours
dev.to · Apr 16, 2026
I just used Caveman and it reduced generation time from 1 hour to 10 min on a complex benchmark. 50% less token spent.
reddit.com · Apr 14, 2026
Featured in
Related entries
claude-code-skills-zh - Chinese Claude Code skills pack
Curated collection of 100+ Claude Code skills with 18 original installable ones, organized by use case. Translations and presets for Chinese-speaking developers.
Claude Code Analysis - architectural reverse-engineering of the leaked source
82 docs and 15 diagrams mapping every major subsystem of Claude Code's accidentally exposed 512K-line TypeScript source - YOLO classifier, 93% context compaction, prompt-cache layout, 88+ feature flags, the custom React-Fiber terminal renderer.
AI Inner OS - inner-monologue plugin for AI CLIs
Cross-tool plugin for Claude Code, Codex CLI, Cursor, and OpenCode CLI that injects an optional 'inner monologue' track alongside normal output. The model decides whether and how to use it.
tnl - typed natural language plan mode
Persistent plan-mode for coding agents: agents propose a typed English contract (paths, MUST/SHOULD/MAY behaviors, non-goals), you approve, and every future session reads it before coding.