Coding Agents Feel Cheap. That Might Not Last.

You open your editor, describe what you want, and a few seconds later there is code on the screen. It is not perfect, but it is usually good enough to keep moving — especially when you are exploring something new or trying to get to a first working version. That removes a lot of friction, and it is easy to see why most teams have folded these tools into their daily workflow.

After a while, the interaction starts to feel normal. You stop thinking about what happens behind the scenes and focus on the output. The tool responds, you adjust, it responds again, and the system gradually converges towards something usable.

Somewhere in that loop, a question tends to disappear rather quickly.

What does this actually cost?

For the past two years, that question was easy to ignore. Token pricing was low enough that it did not show up in any meaningful way, budgets were flexible, and „we use AI“ was often treated as a directional decision rather than something requiring cost attribution.

That environment is slowly changing.


From developer time to metered compute

In the past, most of the effort went into understanding a problem well enough to solve it once. You invested time upfront, made a few design decisions, and implemented something you could rely on and reuse. The cost was front-loaded, but once it was done, the solution was stable.

With coding agents, the process feels different. You describe what you want, look at what comes back, adjust your input, and iterate. The solution emerges over a series of interactions rather than a single, well-defined implementation step. Especially when working with new technologies or bootstrapping from scratch, this can feel significantly faster.

Over time, though, the nature of what you are investing in begins to shift. The focus is no longer on understanding, but on guiding a system that approximates the solution through repeated inference. You are not solving the problem once. You are steering the system towards a solution, step by step.

That works remarkably well as long as the cost of each step remains negligible.

Because each iteration is not just interaction — it is computation. Every refinement, every regenerated snippet, every slight variation carries a cost. Unlike understanding, which you pay for once, this cost reappears continuously. It is tied to the process itself, not just the outcome.

As long as that cost stays in the background, the trade-off feels obvious. The moment it becomes visible, the equation starts to look different.


The hidden loop nobody measures

Most coding workflows with agents follow a pattern that looks efficient from the outside but behaves differently under the hood.

You start with a prompt, receive a result, adjust your request, and regenerate. You refine again, maybe ask for a different approach, or fix edge cases. Each step consumes tokens, and each iteration rebuilds context to some extent.

What appears as a smooth interaction in the interface is, in reality, a loop of probabilistic guesses. The system does not converge in a single step. It explores a space of possible answers, guided by your feedback.

That loop is not a problem in itself. It is a powerful way to work. The issue is that it is rarely measured in terms of cost, and it scales with usage in a way that is not immediately visible.


Stateless systems pretending to understand your code

Most coding agents do not actually understand your codebase in any persistent way. What they do instead is reconstruct a temporary view of your system on every interaction — based on whatever context you provide at that moment, combined with what the tool can retrieve or infer.

To mitigate this, many tools introduced mechanisms like project-level context files. You will see things like CLAUDE.md, rule configurations, or structured prompts that describe coding standards, architecture decisions, naming conventions, and domain-specific constraints. These are useful. Instead of explaining your system from scratch every time, you define a baseline once and reuse it.

However, this comes with a trade-off that is easy to overlook.

These context files are not „remembered“ in the way a human would remember them. They are injected into the prompt for every relevant request. From the model’s perspective, there is no persistent memory across calls. Each interaction is stateless, so the only way to maintain consistency is to resend the necessary context again and again.

That means every call is effectively inflated by: the actual user request, the relevant parts of the codebase, and the context from your configuration files. In practice, this is structured repetition. The system is not building knowledge over time. It is rehydrating it on demand.

Think of it like onboarding a new developer for every single task, handing them the same documentation each time, and asking them to solve a slightly different problem. They might perform reasonably well, but the cost of getting them up to speed is paid over and over again.

Put some rough numbers to it. A team of five developers, each running 30 agent interactions per day, each interaction prefixed by a 2,000-token context file, adds up to around 300,000 tokens in context overhead alone — every single day. Before a single line of actual problem description. That is not a theoretical concern. That is the current default usage pattern for any team that has properly configured their tooling.

What makes this even harder to reason about is that most tools deliberately abstract the underlying cost model. You rarely see tokens per interaction. At best, you get a rough quota or a monthly usage indicator in the settings. The actual unit of cost stays hidden in day-to-day usage.

As a result, we are not trained to think in terms of efficient input. We do not optimize for concise context. We do not question whether every piece of information needs to be sent again. We write prompts the same way we write documentation for humans, even though the economics are completely different.

The irony is that the mechanisms designed to make these tools more reliable also make them more expensive. The better you define your rules and structure, the more context you have to send per interaction. Until these systems move towards truly stateful architectures, this pattern will remain. They do not accumulate understanding. They simulate it, one prompt at a time.


The uncomfortable trade-off

A developer who understands a pattern implements it once and reuses it with confidence. The structure becomes part of their mental model. The next time a similar problem appears, the solution is not rediscovered — it is applied, often with small and deliberate adjustments.

A coding agent approaches the same situation differently. It generates a solution based on the current prompt and the context it receives at that moment. If you ask again, even for something very similar, it goes through the same process once more. The result might look familiar, but it is still a fresh approximation — influenced by how the request is phrased, which parts of the context were included, and what the model considers most likely in that specific interaction.

At first glance, both approaches lead to working code. The difference only becomes visible over time.

With human understanding, the cost is paid upfront. There is an initial investment in thinking, structuring, and working through edge cases. Once that investment is made, the marginal cost of applying that knowledge drops significantly. You are not solving the same problem again. You are reusing something you already understand.

With coding agents, the cost is distributed across usage. Every time you need a variation, you invoke the system again. Even if the pattern is simple or already known, the model still has to reconstruct context, derive a solution, and produce output. You are not reusing understanding. You are re-triggering computation.

In the short term, repeated inference can feel cheaper than investing time into deeper understanding — especially when deadlines are tight or when working in unfamiliar territory. Over time, however, the same patterns reappear. The same classes, the same transformations, the same architectural decisions. If each of these is generated again instead of applied from existing knowledge, the cost accumulates quietly in the background.

That is the nature of recurring cost. It does not hurt at the beginning. It compounds over time, in places where nobody is explicitly looking.

At some point, the question is no longer whether the agent can generate the code. It is whether generating it again is still the most efficient option.


The skill atrophy nobody talks about

There is a second cost that does not show up in token counts at all.

When a team routes enough reasoning through an agent, something shifts in how problems get approached. The question stops being „how do we design this“ and starts being „how do we prompt this.“ That is a different cognitive mode, and it gradually displaces the other one.

This is not about any single developer losing their edge. It is about teams collectively losing the habit of reasoning through problems from first principles. Architecture decisions get made by accepting what the agent produces rather than by building a mental model of what the system actually needs. Edge cases get handled by regenerating until the output looks right, rather than by understanding why they occur.

The output can remain functional for a long time before the structural debt becomes visible. And when it does, the team that stopped reasoning is not well-positioned to untangle it — because the understanding was never built in the first place.

Understanding compounds. When you stop building it, you do not stand still. You slowly lose the foundation that makes complex decisions legible.


The part we are currently underestimating

At the moment, many of these workflows feel economically viable because the underlying pricing is still in a phase of aggressive adoption.

Companies like OpenAI and Anthropic are not only selling tokens. They are building ecosystems, capturing market share, and accelerating integration into development pipelines. In that phase, pricing does not need to fully reflect long-term cost structures. The priority is not margin optimization — it is widespread adoption and dependency.

This is also visible in how pricing is packaged. Flat-rate subscriptions, generous quotas, and seemingly low per-token costs create the impression that usage is almost negligible. From a user perspective, this lowers friction and encourages exploration. From a provider perspective, it increases dependency and embeds the tooling deeper into development processes.

Large-scale inference on GPU infrastructure is not free, and it does not become free because it is bundled into a subscription. A flat rate of a few hundred dollars per month might feel predictable, but it cannot indefinitely absorb unbounded usage — especially when workflows involve continuous iteration, large context windows, and repeated regeneration.

If LLM providers charged the full cost of compute today, a large part of the current usage patterns would look very different. Some of the rapid expansion we are seeing would likely slow down — not because the technology is less useful, but because the economics would force more selective and intentional usage.

We are currently operating in a window where experimentation is deliberately made cheaper, and where the true cost of certain workflows is partially hidden behind pricing models optimized for growth rather than long-term equilibrium.

At some point, that alignment will tighten. And when it does, the question will no longer be whether these tools are useful, but where they remain economically sensible.


What changes next

As pricing and cost structures converge, a more balanced usage pattern will likely emerge.

There is a clear sweet spot where coding agents provide real value: bootstrapping new features, exploring unfamiliar domains, accelerating repetitive work. These are areas where the time saved clearly outweighs the cost. At the same time, there are tasks where a human developer is still more efficient, more precise, and ultimately cheaper — especially for well-understood patterns or critical system components.

As this balance becomes more visible, some teams that have built extensive internal workflows on top of LLMs may need to reassess. What looks sustainable under today’s pricing might not hold under different assumptions. That could mean a shift back towards standardized tooling in some areas, or simply a renewed appreciation for developers who can solve problems without relying on repeated inference.

Neither outcome is a failure. Both are a more honest accounting of what these tools actually cost.


Closing thought

I am glad I can still write code.

Not because I am against AI, but because I have seen what happens when you replace understanding with iteration. It feels fast at first — until you realize you are solving the same problem again and again, just with slightly different prompts. And that the team around you has quietly stopped asking why.

The tools are real. The speed is real. But so is the dependency you are building, and so is the subsidy that is currently making it feel lighter than it is.

Understanding compounds. Guessing gets billed.

As long as someone else absorbs part of that bill, it looks like progress. When that changes, efficiency will mean something different again.

Stay in the loop

Occasional, signal-focused insights on AI, data systems, and real-world execution. No noise. No spam..