Welcome to the challenge of memory

Context is king, and bots are docs

Jul 13, 2025

One topic of conversation that seems to be emerging strongly in the AI world is the importance of memory or context. This is good! It’s a real problem to be solved, and it’s very much the case that LLMs are incomplete without it. Let’s talk about why it’s so much harder than people think.

I like to think of the problem of intelligence as two pieces: how to think, and what to think about. LLMs do a reasonable job on the “how to think” side of things, for the most part. Start the prediction process and it will do a pretty good job of prediction/thinking.

But “what to think about” is totally on us. And we are still doing a lot of work here, sometimes without realizing it, by curating what the LLM sees and pays attention to before it even starts thinking. Sure, modern systems can use search and other tools to pull memories in as they are thinking, but we all know that that is an opportunity for the inference to go off the rails - a bad piece of info from a search, and the bot is off down a rabbit hole.

So that’s challenge number 1: if you are chaining recall (as you would with an agent that does task after task), you have to be very careful that you have very good precision and recall both. If you don’t have good precision in the memories you bring up, the model will get confused or distracted. If you don’t have good recall, it may make a mistake or make up a new fact (that then gets stored and used later). Without a correction mechanism, these errors accumulate.

Another challenge is topology: LLMs are, essentially, serial in nature. Thinking models plus search are starting to challenge this a bit, which is good. But we fundamentally don’t build systems yet that do what the brain does - lots of parallel recall and processing to decide if a memory is relevant, which then impacts other cognitive processes re-entrantly. This is the “hey, wait a minute, that doesn’t seem right” thing that models don’t seem to do very well, and probably why they’re uniformly so gullible. They just have, more or less, a single thread, without our paranoid side processes to help out.

Another challenge with memory is what seems to be happening with AI enabled browsers and other agentic tools. Remember, there aren’t any good mechanisms for preventing jailbreaks. Sure, they can be made harder, but dedicated attackers still always win. Which means if you have a browser that allows an agent to “remember” everything across all of your tabs, congrats! You’ve reinvented XCRF attacks but now massively enabled by AI. So segregating memories and “keeping secrets” isn’t something current models can do on their own - we have to impose that from outside with code.

Finally, even though token contexts are getting very large, just dumping everything into one doesn’t usually work particularly well. Hard to imagine, as a human, being able to read several novels and then pick out the few salient facts to do a simple task. Like having to read not only the whole manual, but also a textbook on steel production and machining before being able to change a sparkplug on your car. It helps but it’s not really much better and doesn’t scale well.

We will need to build real, robust memory systems for LLMs. These need to do all kinds of things: keep secrets, consolidate memories so that the system isn’t overwhelmed with redundancy, inhibit and enhance by use, decay over time but still be able to recall with precision, run in parallel to assess low-probability memories, share memories between agents, make copies for retention and deletion, and probably more.

Memory is the second half of AGI. We have work to do.

Ted

Aug 1

"Memory is the second half of AGI."

Yes. Absolutely and so much! Well. Part at least? This AI Apocalypse is so fun!

One important aspect of LLMs et al that I keep in mind is that they are not us. Most humans can keep 5-7 thoughts in short-term memory aka context. We only 'know' this because we created a model of human thought and based a test (IQ) on that. Should the concept even apply to LLMs? For now, it's the best model we have but I'd guess you agree that we need new models of cognition to grok AGI. So far we've been evaluating them against human tests. What would an LLM-native IQ test be?!

Dogs & horses don't understand humans minds and vice-versa. We have worked together effectively for 1000's of years to our mutual benefit. We may never understand AGI. AGI may never understand us. You & I will never know each others' mental processes but we can work together. There's likely a kernel of deep wisdom in this paragraph. Maybe AGI can figure out what it is. :)

Expand full comment

Sunday Letters

Discussion about this post