For the past several months, I’ve been working on building code around LLMs. Not “here’s a hand-crafted prompt that does a cool party trick”, but really confronting what it takes to build real, repeatable, robust programs on top of these models, using multiple inference calls to multiple models. They have interesting properties, limits, and capabilities.
Code is very good at dealing with process, precision and syntax. It’s repeatable and precise, but brittle. LLMs are the converse - not repeatable (unless you set temperature to 0, they are stochastic), and not precise - but they are very broad, very flexible, and starting to be capable of dealing with semantics - meaning and goals.
Building programs out of these pieces is a little like building a cyborg - you need both kinds of piece, and the boundary between them is hard to get right. It’s important to think in terms of what the code is good for, vs what the models are good for, and partition tasks accordingly.
For example, I’ve been working on generating very long documents, like textbooks, that have >60K words in them. The models are very good at generating content in small pieces, but less so at keeping track of all of the pieces and making sure they all get done - that’s a good job for code.
One other pattern that is emerging is that it’s actually kind of helpful to think about “how would I do this task” as a precursor to designing a program. Often, what is hard for the person is hard for the model, and vice versa. So trying to get the model to do something all at once in some rigid way rarely works well. Breaking the task down into smaller pieces and letting the model ‘reason through it’ works better.
I’ll be sharing more tools and ideas and some code in the coming weeks. This moment feels to me, as an engineer, a lot like the early web, where the development patters were similar, but not identical. Some desktop techniques were useful in, for example, building a web page, but for building fully scalable web-native applications, we had to learn which things worked and which didn’t in the new context.
I believe we will all spend the next few years repeating this process, but with AI - as model capabilities emerge and develop, we will begin learning the right patterns and best practices. It’s important right now to be open minded, curious, and to read and experiment a lot. There will be lots of false starts and funny ideas in the early stages - that’s ok!
Love this articulation of the process — more a negotiation than a dictation. This seems like progress, even though the outcomes are much more variable.
In a way, we've spent 30 years trying to figure out how to tell the machine what to do, and now we've come to the moment where the machine is asking for more agency to work things out for itself — and this is a logical and necessary evolution to higher forms of computing.