LLMs can only generate
There are lots of challenges in building more complex and robust applications that incorporate generative models. One of the more frustrating ones is that it *seems* like you’re talking to a reasonable “person”. It’s ok that they make a few mistakes, right? We do what we do with an actual person - ask them to fix the mistake.
Except, what a generative model does isn’t just fix the mistake - it (re)generates the entire piece of work, from scratch, trying to fix the mistake. It’s easy to see with DALL-E. Tell it to make a sign with some words, it will usually misspell or misform some of the letters. That’s ok! Just tell it to redo one word - except it can’t. You’ll get an entirely new image every time.
We want, and need, these systems to be reliable in order for them to be valuable. People aren’t reliable, but we get along just fine. What’s the difference? At least part of it is that we can *iterate* with people (and with ourselves when doing a task). And that iteration can move flexibly between scales and scopes. If we have a big task to do, we start out sketching out the overall flow, then we work on smaller pieces (sometimes linearly, sometimes not). Then we refine and gradually get to smaller and smaller pieces. Sometimes we have to back up - we rewrite or rework a big piece of content - and that’s annoying! Imagine trying to work if you had to do that ever time.
What does this tell us about building with generative models? That scoping the work is really important. You have to restrict the model to generating only what you want it to generate. You can’t give it a large artifact and expect it to only change part of it and preserve the rest, like a human would. This is a great job for code, in the “think with the model, plan with code” sense - using code to break up and isolate parts of the problem so the model can’t “get into trouble”.
It’s hard to keep this in mind because the interaction feels so natural, but generative models can ONLY generate. They can’t read, they can’t modify. They can just take some input and generate some output. Everything else - the iteration, the selection of scope and context, the construction of the prompt, all of it has to come from outside the model somehow. Right now, that’s mostly with human effort, hopefully more and more that becomes with better coding practices.