One of the debates in the tech industry these days is whether coding with an AI assistant is “worth it”1. That’s not actually how the question is usually phrased though, since “worth it” is a hard thing to define accurately. Usually, it’s asked in terms of speed, which is a simpler question than “does AI make you more productive as a coder?” or even “Are new and more valuable things being done with AI that wouldn’t have been done before (like prototypes and other explorations)?” Even that question is complex though, as we will see.
I had a weird fantasy the other day: imagine if birds had invented airplanes. Think about this for a second. That’s a little like humans, whose main advantage is our brains, inventing LLMs, which can do some of the things our brains can, but work differently. What would birds say about airplanes? Are they “worth it”? I can just fly to that tree over there! What’s the point of a big, expensive, loud plane? You have to wait in line, it takes longer to get anywhere close! It breaks in novel ways that birds never break in! Needs all this fuel and infrastructure, look how expensive that is! Sure it flies, but if your wings aren’t flapping, is it really flying (thinking)? Look, if you turn the steering wheel a certain way, it crashes, birds never crash (well, except sometimes into those window thingys. Look! Airplanes even have windows ON them! Think how hazardous they are!)2 Most birds learn to fly quickly and that that, you have to train to fly an airplane and then it’s super hard to do well. It’s just not worth it!
But of course, there aren’t many birds who can fly across oceans, or can fly as fast as a plane. In our absurd fantasy, there would be uses of the airplane that even birds would find valuable - and perhaps they’d invent new migrations or societies with the airplane. And so on. Airplanes wouldn’t really be good or bad for birds - just different in value in different contexts. But it’s easy to imagine the “why not” stories birds could tell themselves if they felt threatened that something else could now do their core “value” of flying.
The confusion about code, and the absurd story of the birds with the airplanes is another example of our old friend, dimensional reduction. This is when we take a complex, multi-dimensional problem and try to reduce it to a single dimension so that it’s easier to talk about3. This pattern happens all over the place - more or less, any time someone is asking the question “Is A better than B?”, there is a possibility that it’s happening. That comparison, “better”, is a metric - and metrics only work in one dimension. It’s like looking at a red car and a yellow truck and asking which is better … well, it depends on whether you’re evaluating it as a truck or a car, or whether you’re evaluating by color. Maybe you want a red truck, and neither is “better” in that reduction!
We’re doing that here with the AI coding problem. We are looking at many different kinds of coding - new code, code in old codebases, frontend code, backend code, short little projects, long complex ones … there are a lot of dimensions to software. We are more or less discarding them to ask a single dimensional “is it faster?” question. Even if we broaden out to the question we really want to ask, “is it worth it?”, we are still in yes/no territory, and we’ve still had to discard some of that nuance to ask the question.
The real answer, of course, is “it depends”. And it’s even more complex because there are kinds of coding that wouldn’t have happened at all without AI - people who didn’t have the ability to write code at all, or tasks that wouldn’t have been worth the effort. You get a zero divide if you try to compare whether AI makes those “faster”!
To broaden this out a bit into other areas - we will wind up asking this for many domains soon. Code is just an easy one to start with because it’s something the models do pretty well right now, the tooling is well developed, and it’s measurable. But eventually AI will get “plumbed” into all kinds of things.
When we reduce these complex situations to single dimensions, we gain clarity and simplicity in the argument - we can talk about better or worse - and it feels more effective. But in fact, what we are doing is arguing about a summary or analogy, a reduction in information from the real complexity of the issue. This makes it easier to argue, but harder to understand and come to useful conclusions.
It’s frustrating to not have neat answers but complex situations actually need more complex dialog to understand. Think of the birds and the airplane!
Really the debate is whether AI overall is worth what we are investing into it. But coding is a little easier to wrap our heads around, and I think it’s the first really economically useful activity with LLMs, mostly for mundane reasons: code is all text and the tools are easy to access for an AI (e.g. GitHub and other command line tools). That “plumbing” problem will get solved for more and more domains over time, so take this whole letter as a specific instance (coding) that will likely apply more broadly.
You can really run with this analogy a long way. The crash example is a lot like people finding a single specific task that an LLM fails on, and then generalizing to “this is totally useless”, for example. I find that almost every time I hear commentary on LLMs now, if I translate it into “birds inventing airplanes” land, it sounds silly, at best.
I can’t find the quote right now, but I read a poem once to the effect of “this is is an age of miracles. Man can fly, but now the air smells of gasoline”. Everything is complex.
And don't forget ratities - flightless birds that could use an airplane to get off islands, or travel further.
This reminds me of when I bought my first calculator - a Sinclair Scientific. A friend said it was a waste as he could use log tables much more quickly to multiply and divide numbers, and then tried to prove it in a race. Of course, other functions were ignored!
However, the danger of LLMs and coding is that you don't know if the output is any good. Do novices bother to test the output? Will production code introduce security bugs and malware?
[This isn't to say that hand-coding may do the same as we regularly find out.]
My analogy with calculators is what I call "number blindness". I am old enough that I was taught to estimate expected answers, and of course, you needed to do that using a slide rule. Even with a computer I always "sanity check" output to ensure my code is not making gross errors. However, when teaching students, I discovered that when they used calculators, even relatively simple calculations would sometimes be offered with grossly wrong answers, due to incorrect use of the calculator. Without knowing the likely answer in advance, students would not understand that they had made a mistake. Too late to instill the idea of estimation.
And so it is with LLM coding. It is like a spreadsheet that works for simple, single functions, but starts to fail when more complex equations are built. That is the danger with LLMs. It looks like the code is correct; it might even pass a few unit tests. But it may not catch edge cases, and worse, create code that the user doesn't know how to debug.
You can put all the caveats you like on this, but there is a danger that crap code will poison the well as it becomes the training data for the updated LLM.
Isaac Asimov once wrote a short story about how automation that was largely opaque to the user/manager was subtly producing incorrect answers and thereby undermining the global economy. (I cannot recall if it was deliberately done by robots or not.)
Bad actors are "why we can't have nice things". LLMs can be helpful both to good and bad actors. As we see from the flood of mis- and disinformation by bad actors, there is no reason not to suspect that the same won't apply to code as complexity increases and we rely on libraries designed to be malware. Even birds making airplanes might have to deal with saboteurs using more subtle means than blowing up production lines.
This is so beautifully written, Sam! This is clearly the most complex tech-dialogue of our times and you've done great justice to it by saying, "it depends", because it quite evidently does.
I've never put much thought behind the "greying of hair" as we grow older, but after reading this, It struck me that it mirrors how as we grow older, we see grey more than we see black and white. This is super grey and I hope people don't give in to the temptations of declaring it black/white.
Loved the parallel you drew with the birds, the plane and the steering.