s09

Memory

Memory Management

Keep a Layer That Doesn't Lose Details

528 LOC6 toolsDurable memory layer

Some facts should survive summarization and future sessions.

記憶圖書館

1. learn

2. catalog

3. recall

Session A: learn

"Please keep LCC pages concrete for beginners."

Session B: recall

future request has not arrived

.memory library

MEMORY.md catalog

catalog has not been rebuilt yet

Memory file preview

no files on the shelf yet

Beginner rule: the catalog stays cheap and readable; full memory files are borrowed only when the current request needs them.

A Fact Worth Keeping

The user says something that should survive future sessions.

1/8

中文 · 繁中 · English · 日本語

s01 → ... → s07 → s08 → s09 → s10 → s11 → ... → s20

"Compression loses details, keep a layer that doesn't" — File store + index + on-demand loading, across compactions, across sessions.

Harness Layer: Memory — knowledge that survives compaction and sessions.

The Problem

s08's autoCompact preserves current goals, remaining work, and user constraints in the summary, but details get lost: "use tabs not spaces" might get simplified to "user has code style preferences". And when you start a new session, even the summary is gone.

LLMs have no persistent state; all information lives in the context window. When context fills up, it gets compressed, and compression is lossy. What's needed is a storage layer that doesn't participate in compression and persists across sessions.

The Solution

Memory Overview

The s08 compression pipeline is preserved, focusing on memory. Storage uses the filesystem: a .memory/ directory where each memory is a .md file with YAML frontmatter (name / description / type). When files accumulate, an index is needed: MEMORY.md holds one link per line and gets injected into the SYSTEM.

Key design: the index stays in SYSTEM prompt (cacheable by prompt cache), file content is injected on demand (matched by filename/description to the current conversation, without breaking the cache). Writing has two paths: the user explicitly says "remember", or extraction runs in the background after each turn. When files accumulate, periodic consolidation deduplicates.

Four memory types, each answering a different question:

Type	Answers	Example
user	Who you are	"Use tabs not spaces"
feedback	How to work	"Don't mock the database"
project	What's happening	"Auth rewrite is compliance-driven"
reference	Where to find things	"Pipeline bugs are in Linear INGEST"

How It Works

Memory Subsystems

Storage: Markdown Files + Index

Each memory is a .md file with YAML frontmatter for metadata:

---
name: user-preference-tabs
description: User prefers tabs for indentation
type: user
---

User prefers using tabs, not spaces, for indentation.
**Why:** Consistency with existing codebase conventions.
**How to apply:** Always use tabs when writing or editing files.

MEMORY.md is the index, one link per line:

- [user-preference-tabs](user-preference-tabs.md) — User prefers tabs for indentation

Writing a new memory automatically rebuilds the index:

def write_memory_file(name, mem_type, description, body):
    slug = name.lower().replace(" ", "-")
    filepath = MEMORY_DIR / f"{slug}.md"
    filepath.write_text(
        f"---\nname: {name}\ndescription: {description}\ntype: {mem_type}\n---\n\n{body}\n"
    )
    _rebuild_index()

Loading: Two Paths

Path 1: Index in SYSTEM. build_system() reads MEMORY.md once at the start of each user request and injects the memory catalog into the SYSTEM prompt. Memory extraction and consolidation run only when the turn ends, so SYSTEM does not need to be rebuilt repeatedly within the same user request.

Path 2: Relevant memories on demand. At the start of each user request, load_memories() sends the recent conversation and the memory catalog (name + description) to the LLM as a lightweight side-query, selects relevant filenames, then reads and injects their contents. Capped at 5 to control cost.

def select_relevant_memories(messages, max_items=5):
    files = list_memory_files()
    if not files:
        return []

    # Build catalog: "0: user-preference-tabs — User prefers tabs..."
    catalog = "\n".join(f"{i}: {f['name']} — {f['description']}" for i, f in enumerate(files))

    response = client.messages.create(model=MODEL, messages=[{"role": "user",
        "content": f"Select relevant memory indices. Return JSON array.\n\n"
                   f"Recent conversation:\n{recent}\n\nMemory catalog:\n{catalog}"}],
        max_tokens=200)
    indices = json.loads(re.search(r'\[.*?\]', response.content[0].text).group())
    return [files[i]["filename"] for i in indices if 0 <= i < len(files)]

If the side-query fails (API error, JSON parse failure), it falls back to keyword matching on name + description.

Writing: Extraction After Each Turn

Users don't always say "remember this". Preferences are usually scattered across normal dialogue: "tabs are better than spaces", "let's use single quotes from now on".

extract_memories() runs when each turn ends, triggered when the model stops without a tool_use (indicating the conversation has reached a natural break):

# In agent_loop:
if response.stop_reason != "tool_use":
    extract_memories(messages)   # Extract new memories from recent dialogue
    consolidate_memories()       # Check if consolidation is needed
    return

Before extraction, existing memories are checked to avoid duplicates. The extraction prompt asks the LLM to return a JSON array of {name, type, description, body}, writing files only when genuinely new information is found.

def extract_memories(messages):
    dialogue = format_recent_messages(messages[-10:])
    existing = "\n".join(f"- {m['name']}: {m['description']}" for m in list_memory_files())

    prompt = (
        "Extract user preferences, constraints, or project facts.\n"
        "Return JSON array: [{name, type, description, body}].\n"
        "If nothing new or already covered, return [].\n\n"
        f"Existing memories:\n{existing}\n\nDialogue:\n{dialogue[:4000]}"
    )
    # ... parse response, write files ...

Consolidation: Low-Frequency Deduplication

Memory files accumulate. consolidate_memories() triggers when the file count reaches a threshold (default 10), asking the LLM to deduplicate, merge contradictions, and prune stale memories:

CONSOLIDATE_THRESHOLD = 10

def consolidate_memories():
    files = list_memory_files()
    if len(files) < CONSOLIDATE_THRESHOLD:
        return  # Too few, not worth consolidating
    # Send all memories to LLM, get back deduplicated list
    # Replace all files with consolidated results

CC calls this process Dream, with four gates in practice: time interval, scan throttle, session count, file lock. The teaching version simplifies to a file-count threshold.

What Memory Stores

Memory stores information that remains useful across sessions: user preferences, recurring feedback, project background, common entry points, and investigation clues. It focuses on "what will be useful later" and brings that information back through an index plus on-demand loading.

Session memory focuses on continuity inside one session: what context should survive after compaction. The two work together: Memory handles long-term knowledge; session memory handles the current session across compaction.

Changes From s08

Component	Before (s08)	After (s09)
Memory capability	None (preferences degrade with compaction)	Storage + loading + extraction + consolidation
New functions	—	write_memory_file, select_relevant_memories, load_memories, extract_memories, consolidate_memories
Storage	—	.memory/MEMORY.md index + .memory/*.md files
Tools	bash, read, write, edit, glob, todo_write, task, load_skill, compact (9)	bash, read_file, write_file, edit_file, glob, task (6)
Loop	Only compression each turn	Memory injection + compression + post-turn extraction + periodic consolidation

Try It

cd learn-claude-code
python s09_memory/code.py

Try these prompts (enter across multiple turns, observe memory accumulation and loading):

I prefer using tabs for indentation, not spaces. Remember that.
Create a Python file called test.py (observe whether the Agent uses tabs)
What did I tell you about my preferences? (observe whether the Agent remembers)
I also prefer single quotes over double quotes for strings.

What to watch for: Does [Memory: extracted N new memories] appear after each turn? Are .md files generated in .memory/? Is MEMORY.md index updated? Does the Agent automatically load previous memories in new conversations?

What's Next

Memory, compression, and tools are all in place. But the system prompt is still a hardcoded string. Adding a new tool means manually adding a description; switching projects means rewriting the whole prompt. Prompts should be assembled at runtime.

s10 System Prompt → segments + runtime assembly. Different projects, different tools, different prompts.

Deep Dive Into CC Source Code

The following is based on analysis of CC source code under src/ in memdir/, services/, utils/, query/. Line numbers verified against source.

Source Code Paths

File	Lines	Responsibility
`memdir/memdir.ts`	507	Core: MEMORY.md definition (`34-38`), memory behavior instructions distinguishing memory/plan/tasks (`199-266`), `loadMemoryPrompt()` three paths (`419-490`)
`memdir/findRelevantMemories.ts`	141	Sonnet side-query memory selection (`18-24` system prompt, `97-122` call logic)
`memdir/memoryTypes.ts`	271	Type definitions, frontmatter fields
`memdir/memoryScan.ts`	—	Scan .md files, exclude MEMORY.md, read frontmatter, max 200 files, sorted by mtime desc (`35-94`)
`services/extractMemories/extractMemories.ts`	615	Forked agent extraction, restricted permissions, `skipTranscript: true`, `maxTurns: 5` (`371-427`)
`services/autoDream/autoDream.ts`	324	Dream consolidation, four-layer gating (`63-66` defaults, `130-190` gating, `224-233` forked agent)
`services/SessionMemory/sessionMemory.ts`	495	Session-level memory management
`services/compact/sessionMemoryCompact.ts`	—	Session memory lightweight summary, thresholds 10K/5/40K (`56-61`)
`utils/attachments.ts`	—	Injection budget: 200 lines / 4096 bytes per file, 60KB per session (`269-288`); find relevant memory by query (`2196-2241`)
`query.ts`	—	Memory prefetch at start of each user turn (`301-304`), non-blocking collection (`1592-1614`)
`query/stopHooks.ts`	—	Stop hook fire-and-forget triggers extraction and Dream (`141-155`)

Memory Selection: LLM, Not Embedding

CC uses Sonnet itself to select (findRelevantMemories.ts), not embedding vector similarity:

memoryScan.ts scans all .md files in .memory/ (excluding MEMORY.md), max 200 files, sorted by mtime descending
Lists all memory files' name + description as a catalog
Sends to Sonnet side-query: "Select truly useful memories by name and description (max 5). Skip if unsure."
Sonnet returns { selected_memories: ["file1.md", ...] }
Selected files' full contents are read (≤ 200 lines / 4096 bytes per file) and injected. Total session budget: 60KB

At the start of each user turn, query.ts:301-304 starts memory prefetch (async); after tool execution, 1592-1614 collects completed results non-blocking.

Extraction Timing: Stop Hook, Not After autoCompact

Trigger location (stopHooks.ts:141-155): inside handleStopHooks(), fire-and-forget triggers extraction and Dream. The teaching version places extraction in the stop_reason != "tool_use" branch, matching the direction.

CC's extraction runs via forked agent (extractMemories.ts:371-427): restricted permissions, skipTranscript: true, maxTurns: 5. Also has overlap protection: if the main Agent already wrote memory files, extraction is skipped.

Memory File Format

CC uses Markdown + YAML frontmatter, consistent with the teaching version. Four types: user, feedback, project, reference.

memdir.ts:34-38 defines index constraints: MEMORY.md max 200 lines / 25KB. memdir.ts:199-266 builds memory behavior instructions, explicitly distinguishing memory from plan and tasks. Storage location: ~/.claude/projects/<sanitized-git-root>/memory/.

Dream: Four-Layer Gating

Not "triggered when idle" or "consolidate when count is enough", but four gates (autoDream.ts, defaults 63-66, gating logic 130-190):

Time gate: ≥ 24 hours since last consolidation
Scan throttle: Avoid frequent filesystem scans
Session gate: ≥ 5 session transcripts modified since last consolidation
Lock gate: No other process currently consolidating (.consolidate-lock file)

The merge itself runs via forked agent (224-233): locate → collect recent signals → merge and write files → prune and update index. Lock file mtime serves as lastConsolidatedAt. Crash recovery: lock auto-expires after 1 hour.

User Memory vs Session Memory

	User Memory	Session Memory
Persistence	Cross-session	Single session
Storage	Multiple .md files in `memory/`	`session-memory/<id>/memory.md`
Loaded into	system prompt	compact summary
Purpose	Cross-session knowledge accumulation	Cross-compact context continuity

sessionMemoryCompact (mentioned in s08) uses Session Memory: before autoCompact, it reads the session memory file and, if sufficient (≥ 10K tokens, ≥ 5 text messages, ≤ 40K tokens, sessionMemoryCompact.ts:56-61), uses it as a summary without calling the LLM.

Where the Real Implementation Is More Complex

Feature flags: Memory features have multiple feature gate layers
Team memory: Shared team memories, loadMemoryPrompt() has a dedicated path (not covered in teaching version)
KAIROS: Timing-aware memory extraction strategy, daily-log mode in loadMemoryPrompt()
Prompt cache: Memory injection must account for prompt cache TTL, avoiding full system prompt rewrites each turn
File locks: Concurrency control for multi-process scenarios
Memory prefetch: Async prefetch, non-blocking main flow

Teaching Version Simplifications Are Intentional

LLM side-query → LLM side-query + keyword fallback: teaching version keeps LLM selection, adds fallback path
Memory JSON → Markdown + frontmatter: teaching version matches CC
Stop hook trigger → stop_reason != "tool_use" branch: same direction
Four-layer gating → file-count threshold: teaching version lacks transcript system and multi-session concepts
Forked agent + restricted permissions → direct call: teaching version has no subprocess isolation