It's not your friend, it's an API

Why I don't like the word "agent"

Mar 03, 2025

A high-art, realistic photograph of a person sitting on a park bench, holding hands with another person whose head is an old-style CRT monitor. The setting is a peaceful, green park with soft sunlight filtering through the trees. The human is dressed casually, looking at the CRT-headed figure with a mix of curiosity and contemplation. The CRT monitor is slightly tilted as if engaging in a conversation. The scene has a surreal yet thought-provoking atmosphere, blending realism with a touch of sci-fi.

I think there is a very subtle thing going on when most people (at least most lay people or non-AI researchers) think and talk about building software with LLMs. We tend to anthropomorphize these models because it feels right - we are having “conversations” with them, they are “agents” doing things “for” us, they “think”, etc. We think of them as “people”, literally, and subconsciously. It’s understandable - this technology does do a great job of simulating a human mind in some ways (don’t “@” me - I’m not saying in all ways, I’m not saying perfectly, and I’m not saying they are human), can hold reasonable sounding conversations for long periods of time, and can often (not always but often) do real work and real thinking that is really impressive. So, it’s really easy to slip into the trap of thinking they have emotional state, “desires”, “tiredness”, and other human properties. To be polite. To think in terms of persuasion and other social aspects, instead of approaching them as engineering systems. (And when I say “them”, try not to hear it as “a group of persons”, but as a group of objects and systems. The language doesn’t work well for us here).

This infects our use of these systems in lots of ways. If you had a person and you wanted them to do a task, you’d a) assume some competence and grounding that isn’t always present in an LLM, b) ask nicely, or at least feel like, once they “understand” the request, they’ll “do their best” to perform it, and ask for help if they can’t, c) assume you are both working from some basis of “trust”, d) not do weird things like ask them 10 times to do the same thing, or ask them for an answer and then turn back around and break their answer into 10 pieces and ask them to check each one separately, or ask them to do a really huge amount of work (answer 1000 questions) on a whim, that you might throw away.

But these are things we do all the time with API calls: use scale, call it lots of times if we need to, do speculative computation, complex structures like MapReduce, and so on. We use them as tools in larger and more complex systems, we use them without feeling, however the engineering task at hand needs them to be used, and the idea of feeling “guilty” for using (or abusing) a service API call is absurd! (we might care about wasting resources, or it might be impolite or even DDOS to slam a service endpoint, but that’s it). Most folks aren’t *quite* explicit about this, but if you look, you’ll see these assumptions all over the place. People tiptoe around LLMs, instead of treating them like the stateless, emotionless API calls that they are. The language used often betrays this subtle way of thinking about these systems.

There’s also something some people try to do with an LLM that we wouldn’t do with an API call: trust it to keep secrets. If you ever think “I’ll set up the system prompt to not divulge secrets”, or any other kind of “social” “persuasion”, you’re doing it very wrong. Back in the land of an API call, we understand that if we pass a piece of information to a third-party API, we have no idea at all what happens to it - it’s now public, in the wind. And yet, many “agent” designs and other products I see seem to have implicit assumptions that if we somehow “convince” the LLM to be trustworthy, we can trust it with secrets, like a person. It’s not a person! It’s an API call! (I get excited)

The other reason I think this is important to call out is the very broken conversation about intelligence, AGI, and so on. Lots of folks look at LLMs through the lens of “is it as smart as a person yet?” We don’t do this for other services - we just use them! We don’t think “yeah, <cloud database of choice> is really good at remembering things but it’s not good at jokes yet, what a disappointment”. What we actually do is understand what that service is good for, and build it into an engineering system that does something useful.

So many people seem to be swinging for the brass ring of “magic thinky box that does everything”, and missing lots of “wow, I can use this tool to trivially build something I couldn’t before”. Even things that are much narrower than “general purpose human intelligent agent” can be really, really valuable. My team calls this approach “metacognitive recipes” - mixing code (the metacognition - planning, correction, self-testing) with inference in useful ways. A long time ago (in AI land - two whole years!) when the models weren’t very capable, and token contexts were small, and everything was slow, I tossed together a Jupyter notebook with 5 simple prompts I called “the textbook factory” - very reliable way to use 600 or so API calls to produce a textbook for any course, any level, full year, teacher’s guide, etc. You might be able to get an advanced model to do that now with one prompt but then - no way. And I’m sure that there are things we can do with more advanced models now where 1000 or 10000 calls in the right structure are way more useful than what most people are doing, for the right problems, thought about the right way (and I have LOTS of ideas).

And these calls are getting really cheap! DeepSeek R1 is something like $.09/M tokens if you run it locally, which is a penny for a whole novel worth of output! You can do so much work at that cost! For the price of an engineer’s salary for a year ($150K or so) you can generate a novel’s worth of thinking for every record in a 10M record database! Do you think “gee I wish I had a stadium full of these agent fellows I could do that with”? No, that’s silly! But it’s much easier to think “OK, I am going to work out this technique, and then I can make $150K of API calls to get this huge <migration, analysis, whatever> done. And that cost came down 200x last year - it’s going to come down a lot more in the next few. What can you do when that’s $1K of API cost? $100? $10? What kind of speculative things can you do if you can look that deeply at all of the <documents, data, communications, events, logs, build process, etc> in your company? It’s a tool, think of it as engineering, not conversation. Write software!

They’re not your friends. They’re not secret agent man. They’re not people. It’s an API call with really useful tech behind it that can usefully do things in the semantic realm that we couldn’t do with code before.

See the tech for what it is and build useful things with it!

Rob Phair

Mar 5

o3 passes my personal Turing test as a knowledgeable collaborator. I read and then fact check or do an experiment or a simulation to check what I read. I agree with Alex: it never hurts to be polite or to offer positive feedback. Indeed, it's good practice, especially in today's hyper-polarized world. Moreover, those API calls repeatedly yield responses seasoned with totally unnecessary but humanizing adjectives and adverbs even when the underlying LLM is "reasoning" poorly about immunometabolic cause and effect. Is that humanizing layer built by developers or by prompts? Ultimately, we each have to ask ourselves if Becky Chambers' novel, A Psalm for the Wild-Built, the first of her Monk and Robot series, is impossibly optimistic.

Expand full comment

Alex Tolley

Mar 3

If you run a local LLM with no internet access, just local documents as context or a RAG folder, of documents, the conversation is private. While it is much slower (now) than using an online connection to theoretically more documents, a local source of curated documents should give a better result.

While I agree with you that these things are not in any way human, not even alive like pets, there is little harm in treating them well as "agents". Humans have anthropomorphized their cars and computers, and animism is an ancient belief. Don't people still talk to their deities as aware beings?

While I don't anthropomorphize my LLMs, I do tend to be polite to my AMZN Alexa devices, thanking them for a good answer or suggestion. The reason is that I am well aware of how we treated servants in the past and I have trained myself to not treat anyone as a "non-person". If we ever have humanoid household robots, I would do the same, but without internally thinking of them as in any way alive. However, even with a robot, as with a car, I wouldn't mistreat it, if only to ensure it doesn't break by over-taxing it.

Sunday Letters

Discussion about this post