There's a lot of excitement right now about new AI models like ChatGPT, GPT-3, and others. That excitement is justified. There are lots of really powerful new things that can be done with them, and we are almost certainly going to spend the next years figuring out new coding techniques and best practices to build with them.
One thing that's striking though is that we are back in a land of scarcity in a way that I haven't seen in a while. At the start of my career (way back in the 80's), programmers had to spend a lot of time worrying about memory, CPU performance, etc. We still do but the systems we work with now have much more flexibility, and often scarcity can be solved with dollars - you're not worrying so much about whether you can get something to work at all, you're worrying about whether your cloud bill will be too high. And some of these problems, like graphics performance, have been solved by dedicated chips for decades now. It's rare that we have hard performance constraints, unless we also have hard financial ones.
But in the earlier phases of my career, heroics were often needed to even get something working at all. 33 years ago, a friend of mine and I built a video game that ran on the mac classic. Tiny screen (512 x 342 pixels, less than a square inch on an iPhone!), the game was monochrome, and we still struggled to get the frame rate up to 16fps, the bare minimum to make it usable. We wound up having to do things like run-length encoding of the frame buffers and hidden line removal so there were fewer pixels to think about and draw. Flash forward to 2006, writing JavaScript for the early versions of Writely, the browser runtimes were very limited - not much memory, and a limited number of cycles per request, so we had to be very careful with what we did on the client - we let the browser do most of the work and only did things like diffing in the most basic and optimized way.
I see this in AI today, in many ways. There are time constraints - larger models take a long time to run, so you can't use them for anything real-time. There are space constraints - LLMs have token window limits, so you have a small amount of data you can pass in to get a result. And there is hardware scarcity - GPUs are in demand and there is general pressure on chip supply, so it's hard to build a large service or use arbitrarily large amounts of one.
All of this points back to a kind of engineering that may not be familiar to folks new to coding (note how old my examples have to be!). It's easy to look at a set of constraints like that and think "well, can't be done right now, I'll just wait for the world to solve it for me", because, for a long time, that has been the right approach - speed to market mattered more, and scale was mostly a matter of dollars. Those dollars used to be cheap, now they're not, and the things they can buy (non-GPU servers, hard disks) aren't as interesting to the next wave of problems. It's time to optimize!
Almost every major technical transition (and we are for sure in one with AI now) comes with this kind of challenge in the early phases. Tools are weak and ill-suited to new tasks, performance is a challenge on many dimensions, optimization techniques may be non-existent or not well known. It's a great time to dust off your CS fundamentals and do heavy lifting. This is a mindset shift if you aren't used to it, but for many of us, it's a flashback to how it used to be regularly. it's not fun, but it is satisfying when you find your way through what looked like an impossible thicket.
As we go through this transition, it's important to "not take no" for an answer when confronted with this kind of problem - our whole job as engineers is to find our way through these, and in this moment, those skills are more valuable than ever.