One topic of conversation that seems to be emerging strongly in the AI world is the importance of memory or context. This is good! It’s a real problem to be solved, and it’s very much the case that LLMs are incomplete without it. Let’s talk about why it’s so much harder than people think.
I like to think of the problem of intelligence as two pieces: how to think, and what to think about. LLMs do a reasonable job on the “how to think” side of things, for the most part. Start the prediction process and it will do a pretty good job of prediction/thinking.
But “what to think about” is totally on us. And we are still doing a lot of work here, sometimes without realizing it, by curating what the LLM sees and pays attention to before it even starts thinking. Sure, modern systems can use search and other tools to pull memories in as they are thinking, but we all know that that is an opportunity for the inference to go off the rails - a bad piece of info from a search, and the bot is off down a rabbit hole.
So that’s challenge number 1: if you are chaining recall (as you would with an agent that does task after task), you have to be very careful that you have very good precision and recall both. If you don’t have good precision in the memories you bring up, the model will get confused or distracted. If you don’t have good recall, it may make a mistake or make up a new fact (that then gets stored and used later). Without a correction mechanism, these errors accumulate.
Another challenge is topology: LLMs are, essentially, serial in nature. Thinking models plus search are starting to challenge this a bit, which is good. But we fundamentally don’t build systems yet that do what the brain does - lots of parallel recall and processing to decide if a memory is relevant, which then impacts other cognitive processes re-entrantly. This is the “hey, wait a minute, that doesn’t seem right” thing that models don’t seem to do very well, and probably why they’re uniformly so gullible. They just have, more or less, a single thread, without our paranoid side processes to help out.
Another challenge with memory is what seems to be happening with AI enabled browsers and other agentic tools. Remember, there aren’t any good mechanisms for preventing jailbreaks. Sure, they can be made harder, but dedicated attackers still always win. Which means if you have a browser that allows an agent to “remember” everything across all of your tabs, congrats! You’ve reinvented XCRF attacks but now massively enabled by AI. So segregating memories and “keeping secrets” isn’t something current models can do on their own - we have to impose that from outside with code.
Finally, even though token contexts are getting very large, just dumping everything into one doesn’t usually work particularly well. Hard to imagine, as a human, being able to read several novels and then pick out the few salient facts to do a simple task. Like having to read not only the whole manual, but also a textbook on steel production and machining before being able to change a sparkplug on your car. It helps but it’s not really much better and doesn’t scale well.
We will need to build real, robust memory systems for LLMs. These need to do all kinds of things: keep secrets, consolidate memories so that the system isn’t overwhelmed with redundancy, inhibit and enhance by use, decay over time but still be able to recall with precision, run in parallel to assess low-probability memories, share memories between agents, make copies for retention and deletion, and probably more.
Memory is the second half of AGI. We have work to do.