Melody, harmony, coding with AI

Jul 07, 2024

As I’ve been writing for a while now (don’t worry! I’ll get it right eventually!) I think we are at the cusp of a new programming paradigm, akin to the shift from desktop into cloud. I keep finding ways to think about this, and I’m starting to slowly see some patterns and first principles emerge. I think this will take some time, just as it took time in the desktop to internet transition.

That transition was about going from a world of “one user, one thread, one application, one machine”, with scarce compute resources, no real consideration of latency or network (other than maybe slow graphics drivers or drives) and total programmer control of the user experience, to a world of apps fractured across many machines - frontend on a browser, backend on many machines in the cloud. Everything was suddenly stateless, users could come at your program in parallel from multiple directions, even interrupting themselves, resubmitting forms, impersonating each other, and so on. Databases suddenly had to deal with things like the CAP theorem, and ACID wasn’t as important (in some cases) as the ability to achieve massive scale and high speed and throughput. Instead of shipping physical media once a year, you could push code daily if you wanted to. Debugging, development, all of it got much harder and very different.

In the current world of LLMs, we are going from the deterministic world of code - I design, write and debug something, and then that code does exactly that (bugs included) forever, predictably - do the world of stochasticity and probability. In shorthand we can think of this as going from the world of “syntax” to the world of “semantics” - you can use other labels, but this is the fundamental transformation and tension.

Like the internet, the new world presents lots of possibilities but also lots of new challenges, and we will have to rise to it at all levels of the stack. It’s clear that we have to learn to mix the syntactic and semantic worlds somehow. Sometimes this is obvious and sometimes it’s not - for toy problems, it can be fairly easy, but for more serious problems, the challenges remind me somewhat of the early days of object-oriented programming. Back then, it was clear that this was at least a better way to organize code than “pile of functions” (even then, though, we had to learn things like “goto considered evil”). But it wasn’t enough to just slap some classes onto our program - there is never a magic bullet like that. I remember lots of badly done OO programs - places where the programmer hadn’t thought through the abstractions very well, so everything was public in a class, or some weird polymorphism or class structure that never quite covered all the cases, and so on. Taste and experience mattered, and we slowly understood the best patterns.

I think this is happening in hybrid programs that combine the two sides of the LLM world. It’s easy to get lost and try to do an LLM task in code, or to count on the LLM for some reliability it can’t have. I’ve been saying “think with the model, but do planning in code”, and this seems like an ok pattern to start with. Let me give an example (I know this is getting to be a long post - sorry!)

A while ago I built an experiment called the “textbook factory”. I think I’ve written about it here. It takes a sentence, finds the subject and grade level, and then runs a small python program that repeatedly invokes 5 prompts to build a 50k+ word textbook. If you ask even a large context model for something that large and specific, it’s very unlikely you’ll get a good result (probably you can “prompt engineer” your way to a 90% success rate) but the small amount of code makes this much more robust - even though the code isn’t doing much more than looping and bookkeeping.

We call this kind of code a “metacognitive recipe”. We have others - one to guide timed conversations with specific agenda points, one to help with research, etc. I think this is a promising direction to pursue, at least for a while. The trick is finding the spot where this balances, and it’s a bit different for each problem, like OO design is. What you trust the LLM with needs to be reliable, so you have to “pick your battles” carefully. Code is great for making “executive decisions” if you can articulate them clearly enough, and the rigidity doesn’t get you trapped in a corner. Some of these recipes are quite hard but once they are robust, are really useful if you can compose them. For example, the textbook factor and the guided conversation are two parts of what could be a more general training system.

One last metaphor. In music, a song is fundamentally defined by its harmonic structure - the specific progression of chords that make it up. The melody is a little more negotiable - there are key melodic points that fit into that structure, but generally speaking as long as those are referenced, and the melody stays within the chord, you can do anything. To me, this has a little of the feel of one of these dual programs, where the code is the harmonic structure - non-negotiable - and the LLM is improvising the melody around it. Maybe not useful, but a nice way to understand this tension.

Just as with the transition from desktop to internet, I expect the industry-wide transformation to this new model to take years, with lots of opinions, false starts, holy wars, insights, and, as always, the ground moving out from under us constantly. As a programmer, the best thing you can do now is build, listen, learn, observe. Don’t get stuck to the old way of doing things, or to a new way that works for a while but turns out to not be optimal. It’s a great pleasure to be able to do this again - I was never as excited or interested in my code as when we were working through how to build Google Docs at scale, learning from all of the incredible work that had already been done at that company.

I think most of the value is yet to be captured from AI, and much of it won’t come from “pure” application of base models, but rather clever engineering that blends what we already know how to do well, with what the new techniques can do, applied carefully to specific problems.

Sunday Letters

Discussion about this post