The AI honeymoon is almost over
For the past two years, AI has felt like magic.
- Costs looked manageable.
- Capabilities kept improving.
- And the narrative was simple:
Build with AI, and the economics will figure themselves out.
That narrative is starting to break.
Not because the models stopped improving.
But because the economics underneath them never worked to begin with.
The problem isn’t the model
The AI industry keeps trying to solve a cost problem by changing models:
- switch providers
- fine-tune smaller models
- optimize prompts
- reduce context
But none of these address the real issue.
The problem is not the model.
The problem is the unit used to measure cost.
Tokens were never the right unit
Today, AI is still billed primarily through tokens.
Tokens are easy to count. But they fail in two structural ways:
Semantically blind: The same number of tokens does not represent the same amount of work. A trivial rewrite and a complex reasoning task can generate similar token counts while consuming vastly different compute.
Operationally blind: Agentic workflows perform substantial work off-screen — retrieval, tool calls, loops, retries, background execution. Much of that work never becomes tokens at all.
So what looks like “usage” is only a partial view of what is actually happening.
The hidden gap
If cost is driven by compute…and revenue is tied to tokens…then there is a gap.
Some portion of real work is not being properly measured or priced.
That gap doesn’t disappear.
It gets absorbed.
Right now, it is being absorbed by the providers.
The honeymoon phase
This is the honeymoon.
- generous pricing
- free tiers
- predictable bills
- rapid adoption
The goal is clear: get developers building.
And it’s working.
AI usage is exploding. Companies are investing hundreds of billions into infrastructure to support it.
But the underlying economics are under strain.
Costs are rising with usage — even as per-token prices fall.
And companies are already reporting that AI spend is growing faster than revenue.
Why this doesn’t hold
AI doesn’t behave like SaaS. It behaves like a utility:
Not amortized.
Not flattened.
Compounded.
And as agentic systems expand, more work happens off-screen — outside the token boundary.
Which means the gap between measured usage and actual cost grows wider.
At scale, that gap becomes impossible to ignore.
What happens next
When the economics don’t hold, the system adjusts.
We’ve seen this before:
- limits get introduced
- pricing gets restructured
- access gets constrained
- “fair use” policies tighten
Not because companies want to restrict usage.
Because they have to.
Don’t change the model. Change the unit of measure.
The industry is trying to fix this by changing models.
That’s the wrong layer.
You don’t need a new model to fix AI economics.
You need a new meter.
FLOP-Based Metering (FBM) does exactly that.
It does not require changing the LLM.
It sits between the user and the model as a measurement and control layer.
Instead of counting words, it measures the compute work actually performed.
Including the work tokens never see.
The real shift
This is not a pricing tweak.
It is a foundational change.
Because once the unit of measure changes:
- cost becomes predictable
- budgets become enforceable
- governance becomes possible
- pricing aligns with actual work
And the economics finally make sense.
The bottom line
The current system works because the gap is being absorbed.
That is the honeymoon.
But as usage scales, that gap grows.
And when it becomes too large to ignore, the system will be forced to change.
Not the model.
The unit of measure.
– Published on Saturday, March 28, 2026