How does an AI agent remember?
A language model on its own has no memory between turns — every request starts from a blank slate. Agent memory is the set of mechanisms that let an agent encode, store, and retrieve what happened before. Learn the difference between short-term (the context window) and long-term memory (an external store), the memory types agents borrow from cognitive science, and watch a conversation flow through both — right here on the page.
01Why an agent needs memory at all
The AI Agent Governance & Risk Assessment — built for autonomous/agentic systems.
Get the assessment Browse all templatesYour purchase helps keep our hubs free to read.
A large language model is stateless: by itself it remembers nothing from one call to the next. The only thing it can "see" is whatever text you hand it in the current request. So when an agent appears to remember your name, a decision from three turns ago, or a fact it looked up yesterday, that memory isn't inside the model — it's a system built around the model that feeds the right information back in. Surveys describe agent memory along a simple cycle: encode (turn an experience into something storable), store (keep it somewhere), and retrieve (pull it back when it's relevant).
- The model is stateless — memory is plumbing the system adds around it, not a property of the model itself.
- A useful mental model is encode → store → retrieve: capture an experience, keep it, and bring it back when it matters (per memory surveys).
- Without memory an agent can't carry context across turns, learn from past attempts, or personalise to a user over time.
02Short-term vs. long-term memory
The first split to learn is short-term versus long-term. Short-term (working) memory is the information held in the model's context window during a single session or conversation thread — it's right there in front of the model, but it disappears when the session ends and it's strictly finite. Long-term memory is information deliberately persisted in an external store and recalled across sessions and threads. Production frameworks make this concrete: in LangGraph, short-term memory is handled by thread-scoped checkpointers, while long-term memory lives in cross-thread stores. The everyday metaphor — popularised by the MemGPT line of work and its framework Letta — is a computer: the context window behaves like RAM (fast, limited, wiped on restart) and the external store like disk (slower, durable, large).
- Short-term: the context window, scoped to one thread/session; immediate but finite and lost when the session ends.
- Long-term: an external store (often a database or vector store) that survives across sessions and threads.
- LangGraph maps this directly:
checkpointers(short-term, per-thread) vsstores(long-term, cross-thread). - The RAM-vs-disk analogy comes from the MemGPT/Letta "agent-as-operating-system" framing.
03The context window is the agent's working memory
Whatever sits in the context window is what the model can actually reason over right now — its working memory. The catch is that the window is finite, and Anthropic and others document that accuracy and recall tend to degrade as it fills up, a pattern sometimes called "context rot." So once a conversation grows past the budget, the agent has to do something. Common moves: OpenAI's (now-deprecated) Assistants threads auto-truncate older messages when history exceeds the model's context length; LlamaIndex keeps a short-term FIFO queue with a token limit and flushes the oldest messages into long-term memory when it overflows; and Anthropic recommends compaction and context editing — curating the working set of tokens rather than letting it grow unchecked. The skill of deciding what to keep, summarise, or drop is often called context engineering.
- The context window holds everything the model can reference for the current response — its working memory.
- It's finite, and recall/accuracy can degrade as it fills ("context rot," per Anthropic).
- When it overflows, agents truncate, summarise/compact, or flush older content out to long-term memory.
04See it work: a memory-flow simulator
Step through a short conversation and watch where each piece of information goes. New turns land in short-term memory (the context window). Important facts are also written to long-term memory (a vector store). When the context window fills, older turns are summarised and flushed out to make room. On a later turn the agent retrieves a relevant fact from long-term memory by similarity. Then try the switch: turn long-term memory OFF and replay — when the window evicts an early fact, there's nowhere to recall it from, and the agent forgets.
- Write: every turn enters short-term memory; salient facts are also copied to long-term memory.
- Summarise / flush: when the window is full, the oldest turns are compacted out (LlamaIndex FIFO flush; OpenAI truncation; Anthropic compaction).
- Retrieve: on later turns, relevant items are pulled back from long-term memory by similarity search.
- Forgetting: with long-term memory OFF, anything evicted from the window is simply gone.
05Four kinds of memory agents borrow from cognitive science
The CoALA framework (Cognitive Architectures for Language Agents) gives agent memory a useful vocabulary borrowed from how psychologists describe human memory: alongside short-term working memory, there are three flavours of long-term memory — episodic, semantic, and procedural. A note of caution: frameworks implement these labels loosely and inconsistently, so treat them as a thinking tool, not standardised definitions. Switch between them to see what each one stores and a grounded example.
Working memory — what's in mind right now
The short-term store: the information in the context window for the current step. It's where the agent does its immediate reasoning, and it's finite. CoALA treats this as distinct from the three long-term stores below.
Episodic — specific past experiences
A record of particular events the agent lived through. Generative Agents keep an experience "memory stream"; Reflexion stores a buffer of verbal self-reflections from past attempts and reuses them on the next try — learning across trials without changing any model weights.
Semantic — generalised facts & knowledge
Distilled facts and knowledge that aren't tied to any single episode — learned user preferences, domain facts, stable truths. This is the layer most often implemented as a vector store of embedded notes you can search by similarity.
Procedural — reusable skills & routines
Knowing how to do things: skills, routines, even executable code. Voyager builds an ever-growing skill library of code skills it can retrieve and compose, which lets capability compound and helps avoid catastrophic forgetting.
How items get into and out of these stores varies. Generative Agents retrieve by combining recency, importance, and relevance scores, and periodically reflect to synthesise raw observations into higher-level insights. Some systems deliberately forget: MemoryBank reinforces or decays stored items using a model inspired by the Ebbinghaus forgetting curve. And a growing pattern is a dedicated memory layer (e.g. Mem0) that sits between the agent and storage to manage the extract/store/retrieve lifecycle — though any cost or latency reductions such vendors cite are vendor-reported and worth treating with caution.
06Check your understanding
07Take it with you & go deeper
Function calling & tool use
How agents act on the world — and how MemGPT-style agents use function calls to page memory in and out.
Read →Context engineering
The discipline of deciding what to keep, summarise, or drop in the context window — short-term memory in practice.
Read →Agentic RAG
Retrieval as an agent capability — closely related to how long-term memory is searched and brought back into context.
Read →Agent evaluation & observability
How to tell whether an agent's memory is actually helping — measuring recall, drift, and behaviour over long runs.
Coming soon⊕Concept map
Expand each branch to see how the core ideas in this lesson connect.
Why an agent needs memory
- An LLM is stateless — memory is plumbing the system adds around the model, not a property of the model itself.
- Surveys frame memory as a cycle: encode → store → retrieve.
- Without it, an agent can’t carry context across turns, learn from past attempts, or personalise over time.
Short-term vs. long-term
- Short-term (working) memory is the context window for one thread — immediate but finite and lost when the session ends.
- Long-term memory is an external store that survives across sessions and threads.
- LangGraph maps this to thread-scoped checkpointers vs. cross-thread stores.
- The RAM-vs-disk analogy comes from the MemGPT & Letta agent-as-operating-system framing.
The context window as working memory
- It holds everything the model can reference for the current response.
- It’s finite, and recall/accuracy can degrade as it fills — “context rot.”
- On overflow, agents truncate, summarise/compact, or flush older content out to long-term memory.
Write, summarise & retrieve in flow
- Write: every turn enters short-term memory; salient facts are also copied to long-term memory.
- Summarise / flush: when the window is full, the oldest turns are compacted out.
- Retrieve: later turns pull relevant items back from long-term memory by similarity.
- Forgetting: with long-term memory off, anything evicted from the window is simply gone.
Episodic, semantic & procedural memory
- CoALA borrows from cognitive science: working memory plus episodic, semantic, and procedural long-term memory.
- Episodic stores specific past experiences (Generative Agents’ memory stream; Reflexion’s self-reflection buffer).
- Semantic holds generalised facts and preferences, often a searchable vector store.
- Procedural captures reusable skills — Voyager’s ever-growing skill library of executable code.
→Related lessons
Published by Tech Jacks Solutions · Reviewed June 2026. This lesson explains established concepts and is grounded in the references below. Memory-type terminology (episodic / semantic / procedural / working) is borrowed from cognitive science via CoALA and is implemented loosely across frameworks; figures shown in the interactive are illustrative and labelled as such. Several papers below are arXiv preprints; Reflexion and Generative Agents are peer-reviewed venue papers.
- Cognitive Architectures for Language Agents (CoALA) — Sumers, Yao, Narasimhan, Griffiths (Princeton)
- MemGPT: Towards LLMs as Operating Systems — Packer et al. (UC Berkeley)
- Generative Agents: Interactive Simulacra of Human Behavior — Park et al. (Stanford / Google)
- Reflexion: Language Agents with Verbal Reinforcement Learning — Shinn et al. (NeurIPS 2023)
- Voyager: An Open-Ended Embodied Agent with LLMs — Wang et al. (NVIDIA / Caltech)
- MemoryBank: Enhancing LLMs with Long-Term Memory — Zhong et al.
- A Survey on the Memory Mechanism of LLM-based Agents — Zhang et al.
- From Human Memory to AI Memory: A Survey — Wu et al. (Huawei Noah's Ark)
- LangGraph Persistence — checkpointers & stores — LangChain
- Agent Memory: build agents that learn & remember — Letta (formerly MemGPT)
- Agent Memory module — LlamaIndex
- Context Windows — Anthropic
- Assistants — Managing Threads and Messages — OpenAI (deprecated API; architectural illustration)
- Mem0 Platform Overview — Mem0 (vendor; cost/latency claims are vendor-reported)
This is an educational explainer about how agent memory systems are designed. AI systems can produce plausible-sounding but incorrect output, and persistent memory raises real privacy considerations — review what an agent stores about you and your data, and consult your provider's documentation and privacy controls. For decisions with legal, financial, medical, or safety consequences, consult a qualified professional. See the NIST AI Risk Management Framework for governance guidance.
Agent memory architectures — in 8 minutes
Tech Jacks Solutions · AI Knowledge Hub · educational summary
Why memory at all
An LLM is stateless — it remembers nothing between calls. Agent memory is a system built around the model that encodes, stores, and retrieves past information. Mental model: encode → store → retrieve.
Short-term vs long-term
Short-term (working) memory is the context window for the current session/thread — finite, lost when the session ends. Long-term memory persists in an external store across sessions. In LangGraph: checkpointers (short-term) vs stores (long-term). Analogy (MemGPT/Letta): window = RAM, external store = disk.
The context window is working memory
The window holds what the model can reason over now. It's finite and recall degrades as it fills ("context rot," per Anthropic). On overflow, agents truncate (OpenAI threads), flush oldest to long-term memory (LlamaIndex FIFO), or compact (Anthropic).
Write, summarise, retrieve — and forget
Turns are written to short-term memory and salient facts copied to long-term memory; full windows are summarised/flushed; later turns retrieve relevant items by similarity. With long-term memory OFF, anything evicted from the window is forgotten.
Four memory types (CoALA)
Working (context window, now). Episodic (specific past events — Reflexion, Generative Agents). Semantic (generalised facts/preferences). Procedural (reusable skills/code — Voyager's skill library). Terms borrowed loosely from cognitive science.