Prompt engineering is spaghetti code
I was at the University of Michigan this week giving some talks (Go Blue!) One question that came up was “Is prompt engineering a new software discipline?” My answer was no - I think it’s transient. So why is that?
“Prompt engineering” is a product of the current scarcity mentality - we don’t have much GPU around, so folks are trying to make every inference count. This seems to me very much a premature optimization - you’re trying to get something to scale (use less compute) before it’s really working well.
I think about using LLMs as programming tools - it’s an API, call it! Sometimes you get better results by doing multiple inferences. For example, I had a case where all I wanted was the end result of a long explanation. No matter what I did, I couldn’t get the model to produce the answer without the explanation. But if I sent the initial output to a second prompt whose instructions were “read this and extract the answer only”, it did it perfectly.
This is basic modular design: build small pieces that do individual things well, and string them together. By extension, I think most of what people are doing with larger, hand-crafted prompts is akin to making spaghetti code - larger hacks that work but are brittle and hard to change or improve.
I’m not an AI researcher, I’m an app builder. So, I look at AI (LLMs or other models) as programming components to be used as aggressively as needed to solve the problem. Call them! A lot! Don’t worry about scale and cost until you have something actually working well - there’s no sense in optimizing something that doesn’t actually work yet, which is what you are implicitly doing when you try to minimize the number of inference calls.
(Disclosure: I do work for Microsoft, so in theory I am telling you to spend more money on our AI services (or someone else’s!) - but that’s not the point and I would say this even if I didn’t work there).