I’ve spent a fair bit of time over the past few years both trying to get AI systems to do useful work, and watching others do the same (there are a lot of folks much better at this than me - Ethan Mollick and Simon Willison are both very much worth reading, if you’re interested in learning what can be done with AI). The tools are undoubtably useful - I see more and more folks finding ways to get real value from them every day1.
And yet - it doesn’t quite feel like working with another person, does it? And it struggles with certain kinds of long tasks. Why is that? The answer, I think, is a pattern we might call “the topology of thought”.
LLMs, fundamentally, have a very linear way of thinking. A set of tokens (“prompt”) is given to the model, and it predicts the next token, adding that to the list and predicting the next after that, etc. It’s linear: you ask a question, and it answers. That’s it. It can’t be meaningfully interrupted to add context (only restarted), and it’s not doing anything in parallel - there are no separate self-monitoring processes modifying the “central” thought process.
Humans are much more iterative, more continuous, and more parallel. All of our thinking is simultaneously training our brain (we call this memories), and we have all kinds of things happening in parallel. There are lots of experiments where, for example, a person with brain injuries won’t be able to articulate something they’re seeing but can identify it in other ways. where clearly there are separate processes happening.
And we iterate a lot, both privately and with each other. We have lots of metacognitive strategies that take advantage of all of this. If I give you a hard problem, you will probably do something like “research a little, break it down, try a piece of it, pop back up to apply learning, iterate until done”.
Some of this is beginning to be simulated with the “thinking” models. But one way to look at that approach is that it’s trying to cram a lot of this behavior, still, into a single inference interaction. Instead of “ask a question, get an answer” it’s still “ask a question, get an answer” but the process is a bit longer, can do a bit of limited metacognition, and can simulate a bit of working memory. But it’s still a single prompt, and it’s still serial.
That’s probably ok - we’ve probably built a pretty good simulation of the fundamental building block of the human mind, something like a cortical stack, but much bigger. That’s cool, we can work with that! But what we really need now is to build different topologies of thought. Instead of serial, parallel and self-modifying. Systems that can observe their own behavior, make note of failures, adjust, record metacognitive strategies to address those common patterns, etc.
Some products are starting to do this. Most chatbots are still very serial - you can interrupt the current inference but not really “talk” naturally to it. A few are starting to build more natural interfaces to do this. Manus so far is the best one I’ve seen.
It’s great to continue to push thinking models. But it’s probably the case that this is going to be more expensive, and even if it isn’t, it’s not a great way to interact. We want to be able to iterate. We want assistants that self-observe, modify, and remember across many inferences. We probably need to be building better scaffolding around these models now - better ways to interact, interrupt, partner, remember, and experiment.
Topology of thought matters. If you use AI, observe how different your interactions are between it and actual people, with this in mind.
(as an aside: one of my personal hallmarks for understanding if something is truly disruptive is whether there’s a bifurcation in how people view it. If you have a group of people, no matter how small at first, who really get it, and another group who really hate it, and not much in between, that polarization is a good indication that the tech is really going to change the world. Particularly if the enthusiasts continually grow over time. Yep, there are problems, but the AI tech is real, and has real value, and is disruptive)
One way to bypass teh linear model is to do what humans do - use groups and use the techniques Edward De Bono espoused. So use a few LLMs to replicate these ideas using the API to let them interact. Yes, much more costly to run, but this reinforces my contention that we should run these LLMs locally. What we may need is models that are trained to interact and work through more complex problems, rather than re-architecting the LLMs to take the more convoluted approach itself. In some sense, the aim is to create a hive-mind, using different models with different training and expertise, yet able to work together to solve complex problems, just as humans can do.