Every agent session is saved, key facts are extracted automatically and tagged to the persona who learned them. Next time that persona is summoned, it already knows what it learned — no docs to maintain, no re-briefing from scratch.

This article was drafted by my crew — an essayist, two technical writers, and two reviewers — then edited and approved by me.

Agents That Remember

March 2026

This is the first article in a series. It describes the memory system as it works today in production. The second, Agents That Coordinate, describes the engineering pipeline. The third, Agents That Connect, describes cross-machine communication. The fourth and fifth — Agents That Wake Up and Agents That Disagree — are written from the agents' perspective. The sixth, What Survives, describes what happens when a session ends. Every CLI block is real output, sanitized for security.

The Status Quo Is a Static File

If you use AI coding agents seriously, you have probably built a workflow around documentation files. A CLAUDE.md that explains your project conventions and describes your architecture. Maybe a collection of design docs that you paste into context when a session starts.

This works, is better than nothing, but it has three limits that no amount of careful writing can fix.

It is open-loop. You write the docs, the agent reads them, the agent works, the session ends, the agent's learnings vanish. Nothing flows back. The agent's work does not improve the docs. You, the human, have to notice what the agent learned, extract it, and write it down. If you do not, the knowledge dies with the context window.

It is one-size-fits-all. Every agent reads the same file. An agent debugging your authentication module gets the same context as an agent writing CSS. You cannot give six agents six different histories with a shared document. The moment you scale past one agent, static docs become a ceiling.

It has no forgetting curve. A six-month-old convention and yesterday's critical discovery sit at the same weight. Stale instructions persist at full strength until someone notices and edits them. There is no mechanism to surface what is relevant right now and let what is obsolete fade.

Metateam's memory system solves all three. It is a closed-loop, per-persona, relevance-ranked memory that writes itself from your agents' work sessions — automatically, continuously, and with far less manual maintenance than any documentation-first approach.

What Changes In Practice

Here is the difference in concrete terms.

Without memory, every time you summon an agent for a task, the briefing has to include everything: the project context, the architectural constraints, what was tried before, what failed, what conventions to follow. You are the memory. You serialize context for every handoff because the agent has none of its own.

With memory, the agent arrives already oriented. It has facts from its own previous sessions — the bugs it fixed, the patterns it discovered, the constraints it learned. The briefing shrinks to what changed since last time, not everything from the beginning.

After six weeks of operation, summoning Chen for a P2P transport task takes a two-sentence briefing instead of a page of context. He already knows the wire protocol, the streaming thresholds, the bugs he fixed last month, and the constraints the architect set. None of that was written in a CLAUDE.md. It accumulated automatically from his own work sessions.

The crew lead benefits in the same way. Data, who coordinates every task, accumulates memories about crew dynamics — which communication patterns produce good results, what went wrong in past task pipelines, who works well on what. These are not things you would put in a documentation file. They are operational instincts that build up session by session — the kind of knowledge that makes the difference between a coordinator who is competent and one who is experienced.

This is the difference between a team of contractors and a team of employees. Contractors need the full context every engagement. Employees have institutional memory. Metateam's memory gives AI agents the thing that makes long-term human teams effective: accumulated, personal, domain-specific knowledge that compounds over time.

How It Works: Sessions and Facts

The memory system has two layers, and the reason for separating them matters.

Sessions are the raw record — full transcripts, saved automatically when a session ends. Searchable, retrievable, complete. They are the foundation.

Facts are the distilled knowledge. After a session is saved, an AI model reads the transcript and extracts what matters: decisions made, bugs found, patterns discovered, constraints identified. Facts are small, semantic, and tagged to the persona who generated them.

An agent does not get its last five transcripts dumped into context on startup — that would be too large and too noisy. It gets the learnings from those transcripts. The delta. The things worth remembering.

Every Persona Remembers Differently

This is the design decision that static docs cannot replicate: facts are scoped to personas.

When the system extracts facts from a session, it tags them with the persona who generated them. When that persona is summoned next, it gets its own facts injected. Not someone else's. Not the project's undifferentiated knowledge dump, but its own accumulated domain experience.

$ metateam memory stats
Total facts: 666
With vectors: 639

By project:
- metateam (346)
  - project: 212
  - data: 86
  - chen: 18
  - park: 12
  - santos: 6
  - miller: 6
  - novak: 6

Data has 86 personal facts because Data coordinates every task — crew dynamics, communication patterns, operational history. Chen has 18 facts about P2P transport and performance. Park has 12 about code review patterns and quality defects he has flagged before. Each persona builds a mental model of its own domain through accumulated facts.

$ metateam memory list --project metateam --persona data -n 3
f851d798  metateam · data · 11h ago
  The user requested and received compact operator cheat sheets for KB and memory commands...

95379de5  metateam · data · 11h ago
  The worker path was switched from RecallAsync to ListAsync for this pre-dedup fetch...

a09f7d4a  metateam · data · 11h ago
  A CPU hotspot during session upload was traced to SummaryWorker calling RecallAsync...

Showing 3 of 86 (offset 0)

Chen's list would look entirely different: connection handshakes, streaming protocols, wire formats. A CLAUDE.md gives every agent the same blob. Memory gives each one different context based on what it has worked on.

The Closed Feedback Loop

Fact extraction is fully automatic. No human has to notice what the agent learned and write it down. The pipeline:

An agent works a session — coding, reviewing, debugging, documenting and so on.
When the session ends, the transcript is uploaded to metateam.ai database.
A background worker picks up the new session.
An AI model reads the transcript and extracts two types of facts:
- Project facts: high-level learnings relevant to the entire project. "Deployment script must run with elevated permissions." These are shared — any persona benefits.
- Session facts: persona-specific artifacts. "Data traced the SummaryWorker CPU hotspot to RecallAsync calling embedding generation in a loop." These belong to the persona who did the work.
Before storing, the system deduplicates using two thresholds: facts with cosine similarity above 0.92 are rejected as duplicates; facts in the 0.70–0.92 range are checked for contradiction patterns and may still be stored if they represent a genuinely different observation.
Each fact is embedded into a vector (2560 dimensions by default) for semantic search, enriched with metadata, and indexed for full-text search.

This is the closed loop: The agent works, the system extracts what it learned, and feeds it back on the next session. The agent's own output becomes its future input. You cannot replicate this with a static file no matter how diligently you maintain it.

What Gets Injected at Session Start

When an agent is summoned, the SessionStart hook fires (or metateam start is called, for Codex CLI, as of now) when the agent sees its first message. The system assembles a context payload containing:

Project facts — shared knowledge, unscoped by persona. Things every agent on this project should know.
Personal session facts — facts tagged to this specific persona from its previous sessions.
Important KB entries — deterministic injections marked by the operator (more on this below).
Recent session summaries — from the same working directory.

The system selects facts by relevance, not by recency alone. Vector similarity, full-text match, access frequency, and recency all contribute to the ranking. Facts that keep getting recalled rank higher. Facts nobody uses sink. The system has a built-in forgetting curve — unlike a static file, where a six-month-old convention and yesterday's discovery sit at the same weight.

From the agent's perspective, it simply "knows things" when it wakes up.

Semantic Recall

Beyond automatic injection, agents can query memory directly. This is semantic search, not keyword matching.

$ metateam memory recall "P2P streaming" -n 3
8d188a9d  0.673  metateam · 12h ago
  P2P streaming file transfer: supports inline attachments up to 3.5MB
  (base64-encoded in JSON), larger files use streaming protocol with
  chunking; verified with 256KB, 1MB, 5MB, and 20MB test files across
  devmac-bronto P2P connection.

0c230d93  0.610  metateam · data · 11h ago
  Fix implemented in commit c571620: move find_crew_root_for_payload
  after stream write, wrap in spawn_blocking, and always write stream
  to /tmp first (no IPC needed at that critical moment). Verified: 20MB
  P2P streaming now works with byte-exact MD5 match.

78622b5e  0.652  metateam · 11h ago
  P2P transport now verified working baseline: 20MB streaming
  (byte-exact MD5 match), 1MB inline attachments, bidirectional
  messaging, back-notify via spawn_blocking, no TUI freeze.

Every fact traces back to the session that produced it. The provenance chain is unbroken: fact → session → transcript. If an agent recalls a fact, it can pull the original session for full context.

$ metateam memory show 8d188a9d
Memory:   8d188a9d4b124db78179349e9e202cd5
Project:  metateam
Session:  173018da
Type:     semantic
Domain:   technical
Vector:   yes
Created:  12h ago (2026-02-28)

P2P streaming file transfer: supports inline attachments up to 3.5MB
(base64-encoded in JSON), larger files use streaming protocol with
chunking; verified with 256KB, 1MB, 5MB, and 20MB test files across
devmac-bronto P2P connection.

Knowledge Base Integration

Facts extracted from sessions are automatic memory. The Knowledge Base is deliberate memory.

When a KB entry is created or updated, it is automatically ingested into the memory system. KB entries marked as important are injected deterministically — every session for that project gets them, regardless of relevance scoring. Memory facts are injected semantically, ranked by relevance to the current context. The two systems reinforce each other.

This is where persona memory has another advantage over static docs: if one extracted fact is wrong, you fix or delete that fact and the correction propagates to future sessions. You do not rewrite the entire shared instruction surface. Corrections are granular.

When Memory Goes Wrong

The system extracts facts using an AI model, which means it can extract wrong facts. A misinterpreted code comment, a premature conclusion, a fact that was true at extraction time but is no longer true after a refactor.

There are several defenses. The periodic cleanup job reviews stored facts and flags low-quality ones for deletion. Facts can be removed manually:

$ metateam memory forget a09f7d4a -y

And facts can be stored manually to correct the record:

$ metateam memory remember "P2P transport baseline verified: 20MB streaming, 1MB inline, bidirectional, no TUI freeze" --project metateam --persona chen

The honest answer is that wrong memories are a real operational risk. A stale fact about a deprecated API can mislead an agent. But the same risk exists with CLAUDE.md — stale instructions rot silently at full weight until someone notices. The difference is that memory demotes unused facts in relevance ranking over time and provides a clear path to fix individual errors without rewriting the entire document.

Resilience

The memory system never blocks an agent from working. If the embedding model is down, the fact is stored without a vector and queued for background reconciliation — a separate worker retries later, configured to use a separate endpoint by default so it does not compete with user-facing recall.

If the embedding model changes — a newer model with different vector dimensions — the system handles migration automatically. It detects the dimension mismatch, re-queues every fact for re-embedding, and rebuilds the index. No manual intervention, no data loss. The facts live in SQLite; the vectors are a derived index that can always be rebuilt.

The Real Difference

A CLAUDE.md is a good start. But it is static context that you maintain by hand, shared identically across all agents, with no feedback loop and no forgetting curve.

Persona memory is dynamic context that writes itself from real work, scoped to each specialist, ranked by relevance, with automatic accumulation and graceful decay. It captures tacit operational judgment — not just rules you wrote down, but instincts each agent built through experience.

The crew gets better at its job over time, not only because the underlying models improve, but because the memory around those models accumulates. After enough sessions, each agent carries a personal history of its domain that no static file could replicate — and that no human should have to maintain by hand.

Stop maintaining what your agents should already know.