claude-memory-compiler - evolving codebase memory

Hooks capture Claude Code sessions, the Agent SDK extracts decisions and lessons, and an LLM compiler organizes them into cross-referenced knowledge articles. Memory that grows with the repo.

Saved Apr 28, 20264 min readView source ↗

#claude-code #agent-memory #python #context-engineering #knowledge-base

A small idea executed unusually well. Claude Code captures every session you have. The Claude Agent SDK reads each transcript and pulls out the parts worth keeping - decisions, lessons learned, patterns, gotchas. An LLM compiles those daily fragments into cross-referenced knowledge articles organized by concept. The next session reads an index of those articles before you've typed a word.

The architecture is adapted from Karpathy's LLM Knowledge Base gist, but the raw input has been swapped: instead of clipping web articles, the source is your own conversations with Claude Code. Memory that grows with the repo, written by the agent that's about to need it.

The interesting Anthropic policy detail buried in the README: personal use of the Claude Agent SDK is covered under your existing Claude subscription (Max, Team, Enterprise). Unlike memory tools that require separate API billing for the compaction step, this one runs on the subscription you're already paying for.

Quick start

The author's preferred install path is to delegate it to the agent itself:

Clone https://github.com/coleam00/claude-memory-compiler into this project. Set up the Claude Code hooks so my conversations automatically get captured into daily logs, compiled into a knowledge base, and injected back into future sessions. Read the AGENTS.md for the full technical reference.

The agent runs uv sync to install dependencies, copies .claude/settings.json (or merges the hooks into your existing settings), and the hooks activate the next time you open Claude Code. From there, conversations accumulate automatically.

After 6 PM local time, the next session flush triggers daily-log compilation. Manual compilation is available any time:

uv run python scripts/compile.py                       # compile new daily logs
uv run python scripts/query.py "question"              # ask the knowledge base
uv run python scripts/query.py "question" --file-back  # ask + save answer back
uv run python scripts/lint.py                          # run health checks
uv run python scripts/lint.py --structural-only        # free structural checks

How it actually flows

Conversation -> SessionEnd/PreCompact hooks -> flush.py extracts knowledge
    -> daily/YYYY-MM-DD.md -> compile.py -> knowledge/concepts/, connections/, qa/
        -> SessionStart hook injects index into next session -> cycle repeats

Five small Python scripts plus the hook configuration:

Hooks - SessionEnd and PreCompact. The PreCompact one is the safety net that catches sessions that auto-compact mid-conversation rather than ending cleanly.
flush.py - calls the Claude Agent SDK to decide what's worth saving from a transcript. Triggers end-of-day compilation automatically after 6 PM.
compile.py - turns daily logs into organised concept articles with cross-references. Manual or automatic.
query.py - answers questions using index-guided retrieval. No RAG, no vector database.
lint.py - runs seven health checks: broken links, orphans, contradictions, staleness, plus three more structural checks that run free.

The index instead of RAG decision

The argument the README makes (inherited from Karpathy's gist): at personal scale - somewhere between 50 and 500 articles - an LLM reading a structured index.md outperforms vector similarity. The LLM understands what you're actually asking; cosine similarity just finds similar words. RAG starts to matter at roughly 2,000+ articles, when the index itself exceeds the context window.

For an individual developer's working knowledge base, you almost certainly stay below that ceiling. Skipping the embedding step removes a whole class of operational complexity (no vector database, no re-indexing, no embedding model drift) and it produces better answers at the scale you actually use.

What you end up with

Three buckets under knowledge/:

concepts/ - the organised articles. One per concept; cross-referenced.
connections/ - relationships between concepts. The graph layer.
qa/ - distilled question/answer pairs from past sessions.

Plus daily/YYYY-MM-DD.md - the per-day extracted knowledge before compilation. The daily logs are append-only; compilation is the part that organises them into the concept structure.

When to reach for it

Long-running projects where the same lessons keep getting re-learned because nothing remembers them.
Solo developers who've noticed that "context-from-previous-sessions" is a recurring pain.
Teams that want a per-project memory that's reviewable as plain markdown rather than locked in a vector database.
Any project where you'd benefit from "what did we decide about X two months ago" being a 200ms shell command instead of an archaeology session.

When not to

Throwaway repos. The setup overhead doesn't pay back if the project lives a week.
Multi-developer projects without coordination. The compiled knowledge is local to your machine; sharing it is a separate problem.
Workloads where you genuinely have 5,000+ articles. RAG starts to matter at that scale; this isn't designed for it.

Trade-offs

The whole thing is plain markdown - both the input (Claude transcripts) and the output (concept articles). That's the right answer for inspection, version control, and grep. It's also the reason this isn't trying to be a multi-user system.

AGENTS.md in the repo is designed as the technical reference - the file is intentionally written for an AI agent to understand, modify, or rebuild the whole system. If you want to fork the architecture for a different harness, that's the file to read.

The 6 PM compilation trigger is a practical choice (most people's sessions are ending around then) but obviously a single-user assumption. For team use, run compile.py on whatever cadence makes sense.

The author is upfront about the comparison to OpenClaw's memory flush: this version runs on the Claude subscription, not on separate API credits. For long sessions where memory writes are frequent, that's a real cost difference.

Recent discussion

From the wider web

Featured in

Related entries

GitHub ToolFeatured

mcptube - Karpathy-style LLM wiki for YouTube

MCP server that turns YouTube videos into a persistent, merging wiki rather than ephemeral vector chunks. Scene-change frame extraction + vision analysis captures slides, code, and diagrams that transcripts miss. 25+ MCP tools, FTS5+LLM hybrid retrieval, version history with source attribution per claim.

Why I saved this - The wiki-merge design is the differentiator vs RAG-over-YouTube clones - one MCP article with citations, not ten near-duplicate chunks. Scene-change extraction is what makes visual-heavy talks usable.

#mcp #claude-code #knowledge-graph #agent-memory #python

GitHub Tool

mex - persistent project memory for AI agents

Structured scaffold and drift-detection CLI that gives Claude Code, Cursor, and other coding agents a project-level memory file that stays in sync with the codebase.

#claude-code #cursor #agent-memory #context-engineering #cli

GitHub Tool

entroly - self-evolving repo context compressor

Rust daemon that compresses 2M-token repos down ~95% into a Principal Engineer-style context for Cursor, Claude Code, Opus, Codex, and custom providers.

#claude-code #cursor #rust #context-engineering #agent-memory

GitHub Tool

MemoMind - local GPU-accelerated memory for Claude Code

Local-first memory system for Claude Code with GPU acceleration and zero cloud dependency. Provides persistent agent memory via MCP.

#claude-code #agent-memory #mcp #python #local-first