What is a recipe for cake?
You’d know it when you saw it, right? Cake is … cake! It comes in lots of variations, and maybe at the edges it might be debatable if something is cake or something else, but mostly, we know what cake is and what a cake recipe would look like. You could plug different things into a recipe and get different kinds of cakes.
In my never-ending quest to get more useful experiences built with AI, I’ve been exploring an idea that I call recipes. These are mixtures of code and LLM inference that are parameterized. The code does most of the decision making, the model does the “heavy lifting” of thinking about things, and some parameters tell the whole system what the goal is. We built one that does guided conversations - time-based, agenda-specific conversations like doing medical triage, job interviews or form filling. The code enforces the time, makes the model do higher level planning, and keeps it on that plan reliably. The model does the work of talking to the user and thinking.
This works well but it’s really hard to say what is and what isn’t a recipe. There is real tension in here between the fully deterministic (code) and the fully stochastic (inference). Neither end of that spectrum is what we’d call a recipe - code is code, prompts are prompts, both are useful, but both have challenges (code is powerful but brittle and rigid, prompts are flexible but flaky). A recipe is a successful mixture in the middle somewhere.
That middle ground though, feels more and more like a fractal than a specific line we can define. Fractals aren’t fully defined, they depend on how you measure them, so the best you can do is get within epsilon of them. That’s why I used the cake analogy above. There’s also a spectrum from “just a list of ingredients” to “just a description of the cake” - a recipe is neither, but the precise point in the middle is hard to pin down. It’s similar with recipes - it’s surprisingly hard to find the balance point between code and inference. Iteration and interaction with humans might help but that feels like a bit of a cheat. We want programs that work reliably, that also cover lots of new ground (the GCE mentioned above is probably the most successful recipe so far - it turns out to be a very useful and composable element in many other problems. Maybe the issue is that we haven’t found enough of the primitives yet).
Going forward, we are going to try to get crisper at the best practices and first principles that make a recipe successful. We are looking for “archetype” patterns that serve as composable parts of other recipes, and techniques to manage and self-heal errors within more complex systems (like the idea of semantic telemetry, applied to these flows).
All of this will change of course as models get better at planning, metacognition, and interaction. Hopefully this is just a passing phase - the goal still remains to go as directly from human intention to machine action. Recipes are, for now, an interesting point along that road.
I have a lot of logic programming experience due to a well spent youth, and find it interesting to see how non-determinism is interpreted today.
- For some reason people think that non-determinism is the new bogeyman that will ruin their perfect code. Spoiler alert -- if your code does any input or output, it is nondeterministic because the entire world outside your program is constantly changing.
- Becuase people assume their code is deterministic -- but it does input and output -- as a system of code + real world -- its very hard to know if any of it is reliable in any way apart from..... testing!
- People feel better about testing for the real world type of non-determinism, despite the fact that they wrote a program that expects a static world -- the mismatch of design vs reality is written off as just the way things work.
- Give someone a function that is non-deterministic and tells you that -- people freak out. Give people an API call that behind the scenes is non-deterministic in unknowable ways and they will just test it a few times and write off the chance of it returning something quite different the next day or when the toner runs out on level 4.
- Ask people to handle a new problem with their deterministic code and ... well watch them copy the code into a new project, rename all their variables and probably introduce some new bugs to anotherwise well thought out manner of thinking.
- Ask the to handle a new problem every minute.... they will say it cant be done.
But of course if you ask someone if they have deterministic thinking in their human brain leading to them being unable to have original thoughts, you'll hear a different story.