Memory, context, thinking
Like most folks this morning (I write these on Friday) I’ve been reading about and playing with GPT-4o1. It’s cool! It can do some interesting and capable new things. I didn’t have early access to it as a Microsoft employee, so I’m learning at the same time everyone else is. One quick observation: it confirms what I have been suspecting about memory and context.
A lot of the behavior of GPT-4 is weird and funny - we’ve gotten used to it, but it’s like a really smart person that can’t really remember much. Things like RAG help a bunch but the flaws in the existing memory systems make the model seem very naive - often jailbreaks rely on it being “fooled” in a way a human wouldn’t because we can see and understand a larger context. We have continuous memory and can build more robust world models with it.
It’s striking that o1 seems to show you can get better results by applying compute at inference time, not just at training time or with model size. That to me points in the direction of memory - working memory, at least. When asking GPT-4 to solve a problem, it struggles because the steps are disconnected - the memory of what is being done has to be passed between each prompt, and that’s fragile. o1 seems (I don’t understand fully how it works yet) to be keeping memory in a more usable state as it works through its reasoning. That may be at least partially why it’s better - it doesn’t “lose the plot” as much on long and complex tasks.
Not much else to say yet, other than: put on your “What if” hat, remember there are no prizes for being pessimistic and right, and start diving in and making messes and learning.
(as an aside, I spent a bunch of time looking into FLux and Lora this week - I was wrong that there aren’t more complex workflows around AI out there. It’s still very manual, but the visual design/generation world is now very rich with components like Lora, many fine tuned models, several full featured platforms you can work on (both online and desktop), and rich communities. I hope we see more of this for agents and prompts soon).