We all have close friends. We share memories with them. If you removed those memories from both people, you’re just strangers now. Memory is more than just the glue holding the relationship together - in some sense, it is the relationship.
You have memories too. They’re encoded (as far as you can tell) in very difficult to modify organic hardware. That encoding lets you believe in those memories - it’s highly unlikely that you just popped into existence a second ago, fully formed, with the memories of a full life. Or that someone edited your memories in some harmful but undetectable way. Erase those memories completely, and you’re not you anymore. Memory doesn’t just hold you together, it IS you, in some sense.
Intelligence is a mixture of two things: some kind of processing/prediction engine (a thing that “thinks”) and some kind of curation mechanism that aims that engine at certain information, somehow. Cognition, and memory.
Memory is what helps you introspect about yourself - remembering that you’ve tried something in the past and it worked, or didn’t, and to make modifications. It’s what helps you navigate complex tasks (though, often, we need memory aids - notes - to keep it all straight).
LLMs don’t really have memory. Once trained, the model is static - unlike a human, it doesn’t really learn from new experiences. It has a set of “things to pay attention to” passed to it, in the form of a prompt, and an engine that does the cognition, solely on that information.
This means that, when building for an LLM, the burden is entirely on the developer to construct the right set of information in each prompt, so that the “thinking engine” can do its job. In some sense, that engine is utterly helpless - it can’t really examine the world, ask for help, or otherwise manipulate what is given to it. It just processes and predicts.
I realize I’ve written about this before, and I apologize if this is redundant. But the more I stare at parts of this problem, the more I can’t let go of this aspect. When we talk about recipes, we are talking about using code to supplement this memory gap, usually as it applies to metacognition and self-management. RAG and other techniques are about building the memories themselves, and they’re surprisingly hard to do well. Much of prompt engineering is really a way for the human to spend time building memory structures for the helpless model.
I don’t think we understand the general solution to this yet. I’ve seen enough twists and turns to think we still have work to do. Large token contexts, and test-time compute help to a degree, but much of what is still fragile in LLMs seems to boil down to this problem, particularly for large and complex tasks. I do think we can solve this in narrow domains (vertical apps), which is where some of my energy is going these days. And I hope that the narrow tasks will continue to educate us on the broader one.
Complicating all of this is that there’s not really a great metric for this. We can talk about precision vs recall, sure, but the problem is that you are measuring things like “did the 12th step of the 37-step process have the right info to not go off the rails later on?” which is hard to define rigorously.
I’d love to hear from folks who think they have a better grip on a more general solution to this. For a while vector and graph databases looked promising, but that doesn’t seem to work as well as hoped. Large contexts can help in some places. I still find it interesting that tasks that should be achievable, like rewriting a well-known large program or tool in a new well-known language are out of reach, even with models that clearly are competent at all of the individual pieces.
Neurology demonstrates the dependence memory (and ideation in general, BTW) has on emotional centers. No emotion, no memory - efficient filtering.
So what's emotion? Ultimately it rests on pleasure/pain and fight/flight. I discerns what's good or bad for the organism and directs it to success in a thermodynamic field. Or something like that.
My point is, memory is a fundamental part of mind, in service to the organism's success. And success is ultimately how you FEEL - in your body. No body, no mind. LLMs are pretty far from that.
As a coda, human memory is notoriously weak on the details. We remember the feeling mostly. So… feeling to filter and feeling to recall. Not sure what your plan is there.
Like our non-episodic memories, LLMs compress the information. Training with added information updates the network, just as our memories do. We also have relatively poor episodic memory that AFAIK, episodes are not encoded in LLMs, but could be.
Now suppose we introduce a feedback loop in the LLM that provides predictions, that are tested against external inputs, and outputs are adjusted based on prediction error size.
An LLM should be receiving constant inputs, updating its memory, and responding to inputs whether environmental or from conversations.
When thinking, specifically reasoning rather than recalling, is this feedback a way to get to a consistent answer? [I have in my mind a convergence that we see in sparse distributed memory]
This will only be possible when the LLM can handle inputs as fast as human brains, which implies any memory formation must be rapid.