The progress in model development is really staggering. It’s hard to keep track, every day seems to bring a model with a new capability - this week was agents, before that was very inexpensive o1 clones, and so on. It’s really tempting to think that the models are all there is, and to just wait for them to get all the functionality. There is also this weirdly passive attitude towards how the models are invoked - sort of “ask one question and hope for the best”. Of course, people have longer conversations with the models, but overall, the interactions are very much single-threaded. There doesn’t seem to be a lot of parallel execution out there.
Some day we might have a model that just “does it all”. But right now, the approach to building experiences feels a bit like going to a restaurant and having them put a perfect, uncooked carrot on your plate and say, “isn’t this ingredient AMAZING?”. Well, yes, it is, but I’m here for dinner, not to admire the ingredients - do the cooking, set the table…
It’s easy to be distracted by the progress in models, and that progress absolutely makes it more confusing to decide what to build. But the experiences we have today leave a lot to be desired. People have to spend a lot of time setting up context, iterating with the model, extracting the work into other forms, etc. Models are great at doing work, once a human has framed the task, but this limits how useful they can be.
Another example of this limited mindset: I was reading about some “AI-companion” products where people were complaining that the model ran out of context after a few weeks and started forgetting things. Sure! This is because the implementation is just “accrete the history into the context, chop off the top when it gets too long”. But that’s an incredibly simplistic approach - we’ve had vector-based RAG systems for almost two years now. It’s possible to build much better (and faster) conversational systems that have much longer history and memory. We built one as an experiment two years ago - it also had a lot of features that are mostly missing from chatbots even today, like the ability to see and edit the memories and history of the conversation.
Let’s talk about data analysis now. Inference is getting so cheap that the cheapest model I’ve seen this week (one of the DeepSeek versions) could examine 100 tokens of each record in a 1 million record DB for less than $15. That’s kind of insane - we should be taking advantage of that kind of cheap, async inference much more than we are, to do things like “MapReduce but LLM”, or even better agentic memory, etc.
The coming year or two will show really startling base capabilities and cost reductions. That doesn’t mean there isn’t real value in building thoughtful user experiences - to the contrary, that might become the most important thing now, as models rapidly compete and commoditize. Building history and stickiness, focusing on the idea of “users are lazy, make it easy for them”, polishing and completing all the details - this will matter more and more.