The 3 layers of AI memory (and why most tools still stop at layer 2)
Talk to any engineer using AI tools daily and the same complaint still comes up: “I keep re-explaining the same context.” (yes, they did the claude.md and whatnot but still the pesky memory problem continues to evade)
In 2023 that was mostly because tools had no real memory.
In 2026 that excuse is gone. ChatGPT ships Memory and maybe I’m one of the few who gives his data, Claude has Projects and team memory too (although I first saw them in juma, Mem0 and similar platforms act as universal memory layers for agents, and even open‑source stacks now come with drop‑in memory components.
The real question is no longer “does this tool remember?” but what kind of memory it has, and at what scope.
My article proposes a simple way to think about that: three layers of AI memory, from local to organizational, and how they map onto the cognitive model of episodic, semantic, and procedural memory.
How this fits with episodic, semantic, procedural memory
Most serious writing on agent memory now uses a cognitive lens: episodic memory for specific experiences, semantic memory for stable facts and concepts, procedural memory for how to do things, and working memory for what is “on the mind” right now.
A good overview is Memory Systems in AI Agents: Episodic vs. Semantic and similar pieces on long‑term agent memory.
- Episodic memory in AI agents looks like event logs and traces: conversations, tool calls, workflows, incident timelines.
- Semantic memory is the distilled knowledge: “Service A depends on Service B,” “Team Alpha owns payments,” “we use OAuth2 for auth.”
- Procedural memory captures policies and habits: “how we onboard a new engineer,” “our incident runbooks,” “how we do a safe rollback.”
- Working memory is the current context window: the messages, code, and facts the model is actively considering.
This framework doesn’t replace that model. It asks a different question: where do these memories live, who gets to benefit from them, and for how long?
Every episodic, semantic, and procedural fact your AI tools use today ends up in one of three layers of memory: local, product, or shared.
The 3 layers of AI memory
Most AI memory conversations blur together two different axes:
- Scope: is the memory just for this chat? This user in this product? Or the whole organization?
- Architecture: is it just text and vectors? Or a structured graph of entities, relationships, and events?
From that perspective, modern AI memory can fall into three layers:
-
Layer 1 – Local memory
Lives inside a single chat or project. Includes large context windows, per‑thread history, and small per‑user notes tied to one interface. -
Layer 2 – Product memory
Lives inside one product or account. Includes ChatGPT Memory, Claude’s Projects and team memory, and Mem0/Supermemory‑style stores configured as the memory engine for a single app. -
Layer 3 – Shared (organization) memory
Lives outside any single chat or product. A dedicated memory layer (graph + vectors + events) that pulls from code, docs, tickets, and conversations, and is shared across tools and teams.
You can implement episodic, semantic, and procedural memory at any of these layers. The difference is who gets to benefit: just you in one app, or your whole team across everything they use.
graph TD
subgraph L1["Layer 1: Local Memory"]
L1A["Lives in the context window"]
L1B["Scoped to a chat or project"]
L1C["Short-lived, per-thread"]
end
subgraph L2["Layer 2: Product Memory"]
L2A["Survives across sessions"]
L2B["Remembers per user/account"]
L2C["Scoped to one product"]
end
subgraph L3["Layer 3: Shared Memory"]
L3A["Team- and org-wide"]
L3B["Graph of entities & events"]
L3C["Accessible from many tools"]
end
L1 --> L2 --> L3
style L1 fill:#1e1e3a,stroke:#6366f1,color:#e2e8f0
style L2 fill:#1e1e3a,stroke:#a78bfa,color:#e2e8f0
style L3 fill:#1e1e3a,stroke:#22c55e,color:#e2e8f0
Click to expand
Most AI tools today reach at least Layer 2. A few are pushing into Layer 3. That top layer is where the biggest gains are.
Layer 1: Local memory
Layer 1 is the working memory of your AI tools.
It’s the context window and immediate history of your interactions: everything you paste, type, or generate in a single chat or project. Modern models handle 200K+ tokens; some tools effectively treat your entire recent history as “current context.”
On top of that raw context, many tools now add a thin layer of per‑thread or per‑file state:
- Claude keeps track of your current Project and associated files, so it can reason within that workspace without you re‑attaching everything every time.
- ChatGPT keeps recent messages and lightweight “saved memories” in play to maintain coherence across short sessions.
Local memory is enough for:
- Single‑session tasks (“refactor this function”)
- Quick explorations (“how would I implement X?”)
- Short investigations where you don’t care if the context is forgotten tomorrow
But it has clear limits:
- Once a session ends or ages out, most episodic traces are gone.
- Useful insights are trapped as unstructured text inside one conversation.
- Nothing that happened here automatically becomes part of your team’s shared understanding.
graph TD MON["Monday: Teach AI your auth flow"] --> MON_R["Great refactor produced"] TUE["Tuesday: New thread"] --> TUE_R["AI has no idea what auth flow you use"] WED["Wednesday: New thread"] --> WED_R["Re-explain the same context again"] MON_R -.- LOST["Knowledge lost at thread boundary"] LOST -.- TUE TUE_R -.- LOST2["Knowledge lost at thread boundary"] LOST2 -.- WED style LOST fill:#0f172a,stroke:#ef4444,color:#ef4444 style LOST2 fill:#0f172a,stroke:#ef4444,color:#ef4444Click to expand
Local memory answers: “What is this tool thinking with right now?”
Layer 2: Product memory
Layer 2 is what most people now mean when they say “my AI tool has memory.”
It’s persistent, product‑scoped, usually user‑scoped knowledge that survives across sessions in that one app:
- ChatGPT Memory stores facts like “I work on a payments service in Go” and “I prefer TypeScript,” then automatically injects them into relevant new chats.
- Claude’s memory keeps track of your team’s projects, collaborators, and preferences, and lets you inspect and edit what it remembers, with separate memory spaces per project or workspace.
- Mem0 and similar platforms extract structured facts from interactions (“user prefers aisle seats,” “this repo uses Next.js + Tailwind”) and feed them into prompts for a single assistant or app.
In cognitive terms, product memory is where episodic traces from many interactions are distilled into semantic facts and procedural habits for that product.
The win: you stop re‑introducing yourself every morning. The AI knows your stack, your style, your recurring patterns.
But it still misses a lot:
| Product memory handles | Product memory misses |
|---|---|
| Your personal preferences | What the rest of your team knows |
| Your past conversations in that product | Knowledge from other tools and agents |
| Facts you explicitly told it | Changes that happened outside the product |
| One product’s context | Cross‑tool, cross‑team continuity |
The core limitation: it’s tool‑scoped and often user‑scoped. It knows what you told this product. It has no idea what your teammate figured out in a different tool yesterday, or that the architecture changed because of a PR you weren’t tagged on. It uses one canonical artifact (let’s say Cursor uses a repo) as a single source of truth.
graph TD
subgraph ALICE["Alice"]
A_CL["Claude Memory"] --- A_CLD["Knows Alice's preferences"]
A_CU["Cursor Memory"] --- A_CUD["Knows Alice's code patterns"]
end
subgraph BOB["Bob"]
B_CL["Claude Memory"] --- B_CLD["Knows Bob's preferences"]
B_CU["Cursor Memory"] --- B_CUD["Knows Bob's code patterns"]
end
A_CL -.- NO1["Can't share directly"]
NO1 -.- B_CL
A_CU -.- NO2["Can't share directly"]
NO2 -.- B_CU
style NO1 fill:#0f172a,stroke:#ef4444,color:#ef4444
style NO2 fill:#0f172a,stroke:#ef4444,color:#ef4444
Click to expand
Product memory answers: “What does this product remember about this user and their work?”
That’s enough to make single‑tool workflows feel smoother. It is still not enough to make AI feel like a member of your organization.
Because in reality every team has their own single-source-of-truth. Sales are praying to the CRM, engineers are meditating on the repo, marketing are salivating on the blog and so on. Ops are religious about Notion. You get the idea.
Layer 3: Shared (organization) memory
Layer 3 is where long‑term organizational memory lives.
Instead of “facts per user per tool,” shared memory operates at the organization level. It ingests knowledge from multiple systems (code, docs, tickets, incidents, ADRs, Slack) and maintains it as a graph of entities, relationships, and events, often combined with vector indexes for semantic search.
Enterprise AI thought leadership increasingly treats knowledge graphs as the natural plane for this kind of memory: a durable memory layer that stores relationships, decisions, and history across fragmented systems, isolated knowledge and turning LLMs into grounded assistants instead of amnesiac chatbots that work with only one source of truth.
A shared memory layer typically has a few defining traits:
- A temporal knowledge graph that knows what exists (services, teams, APIs, migrations), how they connect, and when they changed (for example, architectures like Zep’s temporal knowledge graph).
- Separate stores for episodic and semantic memory: raw event logs and distilled facts, tied together by entities in the graph.
- Event ingestion so that PRs, deployments, and doc updates automatically refresh the memory, instead of relying on someone to “update the wiki/notion”
- Clean interfaces (APIs, MCP, SDKs) so any agent or tool can query and update it.
From an engineer’s perspective, Layer 3 quietly answers questions like:
- “Have we seen this incident before? What was the root cause and fix?”
- “How are we repurposing the changelog into sales pitch decks?”
- “What did we decide about client 007 and their requirements?”
graph TD
subgraph SH["Shared Memory Layer"]
direction TB
K1["Architectural decisions"]
K2["Team conventions"]
K3["Service ownership"]
K4["Migration status"]
K5["Past incident solutions"]
end
SH -->|MCP/API| T1["Alice in Cursor"]
SH -->|MCP/API| T2["Bob in Claude"]
SH -->|MCP/API| T3["CI Pipeline Agent"]
SH -->|MCP/API| T4["Custom Internal Agent"]
K1 --- REL1["OAuth2 Migration -- affects --> Payment Service"]
K3 --- REL2["Payment Service -- owned by --> Team Alpha"]
style SH fill:#1e1e3a,stroke:#22c55e,color:#e2e8f0
Click to expand
Relationship awareness
Layers 1 and 2 mostly store facts as isolated text or embeddings. Layer 3 stores knowledge as a graph of connected entities, and that changes what the AI can actually do with it.
Consider this question: “Should I use the old authentication pattern or the new one?”
| Memory layer | What happens |
|---|---|
| Local | AI has no idea. You explain from scratch. |
| Product | AI remembers you mentioned OAuth2 last week. No other context. |
| Shared (graph) | AI traverses: OAuth2 Migration affects Service A, B, C. Services A, B migrated. C, D pending. Recommends new pattern for new code, old pattern only for Services C, D until migration completes. |
The graph doesn’t just find relevant text. It traverses relationships to construct a complete, current answer.
Temporal awareness
Shared memory knows when things happened. It distinguishes between a decision made today and one from six months ago that’s since been reversed.
Benchmarks like LoCoMo and LongMemEval show how badly models struggle once conversations stretch across many sessions and require temporal reasoning: even long‑context models and basic RAG approaches see accuracy drops of 30–60% on the hardest scenarios.
Without temporal context, you get confidently wrong answers based on stale data. An AI tool recommending your old auth pattern because it doesn’t know you migrated last week? That’s not a hallucination problem. It’s a staleness problem.
Self‑maintenance
The memory layer watches your sources and updates itself. PR merges that change the API contract? Reflected. Decision doc updated? Reflected. That one client demanding edge cases. Closed and escalated. Nobody has to maintain it by hand.
If you’ve ever watched a wiki rot within weeks of being written, you know why this matters.
Shared memory answers: “What does our organization know, how is it connected, and how is it changing over time, regardless of which tool you’re using right now?”
How each layer handles real scenarios
Scenario: New engineer onboarding
| Local | Product | Shared | |
|---|---|---|---|
| “How do I set up local dev?” | Paste the doc manually | AI remembers the engineer asked before | AI pulls current setup guide, flags that DB was changed from Postgres to SQLite last week |
| “Who owns the payment service?” | No idea | No idea | Team Alpha, lead: Alice, review policy: 2 approvals |
| “Why do we use this pattern?” | No idea | No idea | Links to ADR‑007 from last quarter, authored by Bob |
Scenario: Debugging a production incident
| Local | Product | Shared | |
|---|---|---|---|
| “We’ve seen this error before?” | No history | Remembers your past sessions | Finds the incident from 3 months ago, the root cause, and the fix, pulling from the whole team’s history |
| “What changed recently?” | You paste the deploy log | Limited context | Correlates the error with Friday’s deploy, the specific PR, and the engineer who can explain the change |
Scenario: Architecture decision
| Local | Product | Shared | |
|---|---|---|---|
| “Should we use Redis or Postgres for this cache?” | Generic pros/cons | Remembers your past preference | Knows you already run Redis for sessions, that the ops team prefers fewer datastores, sales are pushing for realtime, and that a similar decision was made for the notification service last month |
The pattern is clear: the more your tasks depend on time, relationships, and cross‑system knowledge, the more you need Layer 3‑style memory.
Where current tools sit on this spectrum
Given these three layers, the current landscape looks less like “who has memory?” and more like “how far up the stack do they go?”:
graph TD
subgraph LAYER1["Layer 1: Local"]
L1C["Raw APIs"]
L1D["Simple wrappers"]
end
subgraph LAYER2["Layer 2: Product Memory"]
P1["ChatGPT Memory"]
P2["Claude Projects & team memory"]
P3["Mem0 (per-app mode)"]
end
subgraph LAYER3["Layer 3: Shared (Org-wide)"]
S1["Zep temporal KG"]
S2["Cognee-style stacks"]
S3["Knowledge Plane"]
end
LAYER1 -->|"Add per-user persistence"| LAYER2
LAYER2 -->|"Add team + graph + events + MCP/API"| LAYER3
style LAYER1 fill:#1e1e3a,stroke:#6366f1,color:#e2e8f0
style LAYER2 fill:#1e1e3a,stroke:#a78bfa,color:#e2e8f0
style LAYER3 fill:#1e1e3a,stroke:#22c55e,color:#e2e8f0
Click to expand
Most of the AI memory space is focused on Layer 2, making individual tools remember individual users better. That’s valuable, but it doesn’t solve the team problem. We wrote honest reviews of Mem0, Zep, Cognee, Letta, and LangMem in our AI Memory Landscape.
Layer 3 is where things are still wide open. What collaborative teams actually need (shared, structured, current, tool‑agnostic memory). Basically a different architecture from per‑user persistence, different enough that it probably can’t be bolted onto Layer 2 tools after the fact.
What to look for if you care about team memory
If you’re evaluating AI memory for your team, “has memory” is too vague. The important questions are:
Scope
- Does this memory apply to one chat, one product, one user, or to the entire team?
- Can multiple agents and tools query and update the same memory layer?
Structure
- Is everything stored as blobs of text and vectors, or is there a graph of entities, relationships, and events?
- Can the system answer questions that depend on ownership, dependencies, and time (“who owns X now?”, “what changed since Y?”)?
Freshness
- Does the memory update itself when code, docs, or tickets change, or is someone expected to “update the wiki”?
- How quickly do PRs, ADRs, and incidents propagate into the memory?
Governance
- Is there provenance on facts? Can you see which document or event a conclusion came from?
- Can you handle contradictions and outdated knowledge in a principled way?
Access
- Is there a clean API or protocol (like MCP) for any tool to use it?
- Can you run it where your data lives (self‑hosted / VPC), if that matters?
Tools that score well on these questions are operating at Layer 3, even if they use different terminology.
Where this leaves us
AI tools don’t lack memory anymore to feel intelligent. They lack the right kind of memory, in the right place.
Local memory (Layer 1) makes single sessions powerful. Product memory (Layer 2) makes individual tools feel personal and reduces repetition. But shared, organization‑level memory (Layer 3) is what lets AI behave like a true teammate aware of your tribal knowledge: architecture, history, decisions, and the fact that yesterday’s migration changed how everything fits together.
As more tools quietly climb from “no memory” to Layer 2, the differentiator won’t be “we have memory”
It will be who owns your organization’s memory, how it’s structured, and which tools are allowed to see it.
Venture Builder @ Knowledge Plane
Obsessed with high-performing teams and validation-driven development. Forbes 30 under 30. Building at Camplight and Knowledge Plane.