Discovery
Back to browse
GitHubHackFeatured

caveman - Claude Code skill that talks like caveman

Claude Code skill that rewrites prompts in caveman speech to cut 65% of tokens with no measurable quality loss. Joke premise, real savings.

5 min readView source ↗

A joke premise, a serious result. Caveman is a Claude Code skill (and Codex/Gemini/Cursor/Windsurf/Cline/Copilot plugin) that rewrites the agent's prompts in caveman-speech and cuts roughly 65% of output tokens with no measurable accuracy loss. The README's tagline ("why use many token when few token do trick") is also a fair summary of the engineering claim.

Two reasons it earns the 9k+ stars: the savings are reproducible from the project's own eval harness, and the implementation is one-command install for nearly every major coding agent. The caveman thing is a Trojan horse for terse-prompt research that would be ignored if it shipped under a less fun name.

What it does, in two examples

Normal Claude (69 tokens):

"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I'd recommend using useMemo to memoize the object."

Caveman Claude (19 tokens):

"New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."

Same fix. ~75% fewer tokens. Brain still big.

Intensity levels

Pick your level of grunt:

  • Lite (/caveman lite) - drop filler, keep grammar. Professional, no fluff.
  • Full (/caveman full) - default caveman. Drop articles, fragments, full grunt.
  • Ultra (/caveman ultra) - maximum compression. Telegraphic. Abbreviate everything.
  • Wenyan (/caveman wenyan, wenyan-lite, wenyan-ultra) - classical Chinese literary compression. Same accuracy in arguably the most token-efficient written language ever.

Levels stick until you change them or the session ends.

Install

One command per agent:

AgentInstall
Claude Codeclaude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman
CodexClone repo → /plugins → search "Caveman" → Install
Gemini CLIgemini extensions install https://github.com/JuliusBrussee/caveman
Cursornpx skills add JuliusBrussee/caveman -a cursor
Windsurfnpx skills add JuliusBrussee/caveman -a windsurf
Copilot / Cline / othersnpx skills add JuliusBrussee/caveman

For Claude Code and Gemini, auto-activation happens via SessionStart hooks and context files - install once, get caveman in every future session. For the others, npx skills add installs the skill but not the auto-activation snippet, so you trigger with /caveman, $caveman, or "talk like caveman" each session (or paste the always-on snippet from the README into your system prompt).

The benchmarks

The README publishes its own eval harness output. Real token counts from the Claude API:

TaskNormalCavemanSaved
Explain React re-render bug118015987%
Fix auth middleware token expiry70412183%
Set up PostgreSQL connection pool234738084%
Explain git rebase vs merge70229258%
Refactor callback to async/await38730122%
Architecture: microservices vs monolith44631030%
Review PR for security issues67839841%
Docker multi-stage build104229072%
Debug PostgreSQL race condition120023281%
Implement React error boundary345445687%
Average121429465%

Range is 22%–87%, depending on how prose-heavy the response naturally is. Architecture explanations compress less; bug explanations compress more.

The important caveat the README is honest about: caveman only affects output tokens. Thinking/reasoning tokens are untouched. The biggest practical win is readability and response speed; the cost savings are a bonus.

The non-obvious feature: caveman-compress

/caveman makes the agent speak with fewer tokens. caveman-compress makes it read fewer tokens. It rewrites your CLAUDE.md (and any other context files Claude loads every session start) into caveman-speak, so the agent's input is smaller every time it boots.

/caveman:compress CLAUDE.md

After running:

CLAUDE.md          ← compressed (Claude reads this every session, fewer tokens)
CLAUDE.original.md ← human-readable backup (you read and edit this)

The README's measured savings on real CLAUDE.md-style files:

FileOriginalCompressedSaved
claude-md-preferences.md70628559.6%
project-notes.md114553553.3%
claude-md-project.md112263643.3%
todo-list.md62738838.1%
Average89848146%

Code blocks, URLs, file paths, commands, headings, dates, and version numbers pass through untouched. Only prose gets compressed.

(Security note from the upstream: Snyk flags caveman-compress as High Risk due to subprocess/file patterns. It's a false positive - the project's SECURITY.md explains why.)

Other skills shipped in the same plugin

  • caveman-commit - terse commit messages. Conventional Commits, ≤50 char subject, why-over-what.
  • caveman-review - one-line PR comments. L42: 🔴 bug: user null. Add guard. No throat-clearing.
  • caveman-help - quick-reference card; lists all modes, skills, commands.

When to reach for it

  • Long agent sessions where output volume is a real cost driver.
  • Codebases with bloated CLAUDE.md / context files where session-start tokens are silently hurting you.
  • Anyone who finds verbose AI explanations slower to read than to skim.

When not to

  • Tasks where the prose itself is the deliverable (writing docs, drafting emails, customer-facing copy).
  • Audiences who'll find caveman speech unprofessional. The Lite intensity is the right starting point if you're not sure.

Why this works (the boring research version)

There's a March 2026 paper - "Brevity Constraints Reverse Performance Hierarchies in Language Models" - that found constraining large models to brief responses improved accuracy by 26 percentage points on certain benchmarks. Verbose isn't always better. Sometimes fewer words means more correct.

Caveman is the practical, opinionated, fun-named expression of that result. The eval harness lives in evals/ if you want to verify the numbers yourself.

Recent discussion

From the wider web

Featured in

Related entries

GitHubHackFeatured

Claude Code Analysis - architectural reverse-engineering of the leaked source

82 docs and 15 diagrams mapping every major subsystem of Claude Code's accidentally exposed 512K-line TypeScript source - YOLO classifier, 93% context compaction, prompt-cache layout, 88+ feature flags, the custom React-Fiber terminal renderer.

Why I saved this - Useful primary source for anyone building a coding agent - the YOLO two-stage classifier, the cache-busting after MCP instructions, and the 6 compaction strategies are the bits nobody else has documented.