claude-memory-compiler - evolving codebase memory
Hooks capture Claude Code sessions, the Agent SDK extracts decisions and lessons, and an LLM compiler organizes them into cross-referenced knowledge articles. Memory that grows with the repo.
A small idea executed unusually well. Claude Code captures every session you have. The Claude Agent SDK reads each transcript and pulls out the parts worth keeping - decisions, lessons learned, patterns, gotchas. An LLM compiles those daily fragments into cross-referenced knowledge articles organized by concept. The next session reads an index of those articles before you've typed a word.
The architecture is adapted from Karpathy's LLM Knowledge Base gist, but the raw input has been swapped: instead of clipping web articles, the source is your own conversations with Claude Code. Memory that grows with the repo, written by the agent that's about to need it.
The interesting Anthropic policy detail buried in the README: personal use of the Claude Agent SDK is covered under your existing Claude subscription (Max, Team, Enterprise). Unlike memory tools that require separate API billing for the compaction step, this one runs on the subscription you're already paying for.
Quick start
The author's preferred install path is to delegate it to the agent itself:
Clone https://github.com/coleam00/claude-memory-compiler into this project. Set up the Claude Code hooks so my conversations automatically get captured into daily logs, compiled into a knowledge base, and injected back into future sessions. Read the AGENTS.md for the full technical reference.
The agent runs uv sync to install dependencies, copies .claude/settings.json (or merges the hooks into your existing settings), and the hooks activate the next time you open Claude Code. From there, conversations accumulate automatically.
After 6 PM local time, the next session flush triggers daily-log compilation. Manual compilation is available any time:
uv run python scripts/compile.py # compile new daily logs
uv run python scripts/query.py "question" # ask the knowledge base
uv run python scripts/query.py "question" --file-back # ask + save answer back
uv run python scripts/lint.py # run health checks
uv run python scripts/lint.py --structural-only # free structural checks
How it actually flows
Conversation -> SessionEnd/PreCompact hooks -> flush.py extracts knowledge
-> daily/YYYY-MM-DD.md -> compile.py -> knowledge/concepts/, connections/, qa/
-> SessionStart hook injects index into next session -> cycle repeats
Five small Python scripts plus the hook configuration:
- Hooks - SessionEnd and PreCompact. The PreCompact one is the safety net that catches sessions that auto-compact mid-conversation rather than ending cleanly.
flush.py- calls the Claude Agent SDK to decide what's worth saving from a transcript. Triggers end-of-day compilation automatically after 6 PM.compile.py- turns daily logs into organised concept articles with cross-references. Manual or automatic.query.py- answers questions using index-guided retrieval. No RAG, no vector database.lint.py- runs seven health checks: broken links, orphans, contradictions, staleness, plus three more structural checks that run free.
The index instead of RAG decision
The argument the README makes (inherited from Karpathy's gist): at personal scale - somewhere between 50 and 500 articles - an LLM reading a structured index.md outperforms vector similarity. The LLM understands what you're actually asking; cosine similarity just finds similar words. RAG starts to matter at roughly 2,000+ articles, when the index itself exceeds the context window.
For an individual developer's working knowledge base, you almost certainly stay below that ceiling. Skipping the embedding step removes a whole class of operational complexity (no vector database, no re-indexing, no embedding model drift) and it produces better answers at the scale you actually use.
What you end up with
Three buckets under knowledge/:
concepts/- the organised articles. One per concept; cross-referenced.connections/- relationships between concepts. The graph layer.qa/- distilled question/answer pairs from past sessions.
Plus daily/YYYY-MM-DD.md - the per-day extracted knowledge before compilation. The daily logs are append-only; compilation is the part that organises them into the concept structure.
When to reach for it
- Long-running projects where the same lessons keep getting re-learned because nothing remembers them.
- Solo developers who've noticed that "context-from-previous-sessions" is a recurring pain.
- Teams that want a per-project memory that's reviewable as plain markdown rather than locked in a vector database.
- Any project where you'd benefit from "what did we decide about X two months ago" being a 200ms shell command instead of an archaeology session.
When not to
- Throwaway repos. The setup overhead doesn't pay back if the project lives a week.
- Multi-developer projects without coordination. The compiled knowledge is local to your machine; sharing it is a separate problem.
- Workloads where you genuinely have 5,000+ articles. RAG starts to matter at that scale; this isn't designed for it.
Trade-offs
The whole thing is plain markdown - both the input (Claude transcripts) and the output (concept articles). That's the right answer for inspection, version control, and grep. It's also the reason this isn't trying to be a multi-user system.
AGENTS.md in the repo is designed as the technical reference - the file is intentionally written for an AI agent to understand, modify, or rebuild the whole system. If you want to fork the architecture for a different harness, that's the file to read.
The 6 PM compilation trigger is a practical choice (most people's sessions are ending around then) but obviously a single-user assumption. For team use, run compile.py on whatever cadence makes sense.
The author is upfront about the comparison to OpenClaw's memory flush: this version runs on the Claude subscription, not on separate API credits. For long sessions where memory writes are frequent, that's a real cost difference.
Recent discussion
From the wider webClaude Code Plugins for Business Analysts: A Real Confluence Audit Use Case
medium.com · Apr 30, 2026
Open source memory layer so any AI agent can do what Claude.ai and ChatGPT do
dev.to · Apr 30, 2026
I Let Claude AI Handle My Work for a Day - It Shocked Me
medium.com · Apr 30, 2026
Claude.ai and API Unavailable
status.claude.com · Apr 30, 2026
Claude.ai Down Again?
news.ycombinator.com · Apr 30, 2026
Featured in
Claude Code tools, plugins, and integrations
The best tools, MCP servers, and harnesses for getting more out of Claude Code - orchestration, observability, telemetry, and remote control.
Memory and knowledge graphs for AI agents
Memory layers, knowledge graphs, and persistent context stores for agents - the substrate underneath useful long-running systems.
Related entries
mcptube - Karpathy-style LLM wiki for YouTube
MCP server that turns YouTube videos into a persistent, merging wiki rather than ephemeral vector chunks. Scene-change frame extraction + vision analysis captures slides, code, and diagrams that transcripts miss. 25+ MCP tools, FTS5+LLM hybrid retrieval, version history with source attribution per claim.
mex - persistent project memory for AI agents
Structured scaffold and drift-detection CLI that gives Claude Code, Cursor, and other coding agents a project-level memory file that stays in sync with the codebase.
entroly - self-evolving repo context compressor
Rust daemon that compresses 2M-token repos down ~95% into a Principal Engineer-style context for Cursor, Claude Code, Opus, Codex, and custom providers.
MemoMind - local GPU-accelerated memory for Claude Code
Local-first memory system for Claude Code with GPU acceleration and zero cloud dependency. Provides persistent agent memory via MCP.