Agentic AI broke the token-based economic model. But it’s always been broken.
By Alan Jacobson, AI Economics Strategist
Everyone is building dashboards.
Beautiful ones. Clean charts. Real-time graphs. Observability everywhere.
But what’s the point if what you are seeing is missing the point?
The entire system is built on a unit that cannot see what matters.
If you are counting tokens, you are flying blind for two reasons:
Semantic blindness: Tokens count words, but they don’t understand meaning. A simple request and a complex task may use the same number of tokens — while requiring very different amounts of compute.
Invisibility: Agentic workflows perform work off-screen — retrieval, tool use, iteration — that never becomes tokens at all.
So tokens measure neither the meaning of the work, nor the full amount of work performed.
You are looking at the instrument panel, but half the gauges are disconnected — and the ones that work are mislabeled.
Now watch how this breaks in practice.
Consider these two scenarios:
A guy talks to AI for thirty minutes about his girlfriend. He goes on and on…
How she seems distant.
How she is slow to respond to texts.
How she is mysteriously unavailable.
The system dutifully transcribes every word, responds empathetically and consumes a massive number of tokens — all while avoiding the four words a human would scream immediately:
SHE’S CHEATING ON YOU!
High token count. Low cognitive demand. Minimal real compute required.
Now consider a three-word query:
“Is God real?”
Few questions demand more reasoning, context, philosophy and depth. Yet under token-based billing, that interaction may never recover the cost of compute.
Low token count. Massive cognitive demand. High compute requirement.
And in both cases, look at the asymmetry between input, output and effort.
There is no correlation between number of tokens — either in or out — and compute.
That’s the first failure: tokens are semantically blind.
They treat every word as if it carries the same cognitive weight, the same inferential load, the same computational demand.
But meaning is not evenly distributed across language.
Some words are pebbles.
Some are boulders.
Tokens weigh them the same.
Now layer in the second failure — the one that agentic AI just exposed.
Work is no longer confined to what you can see.
Agents retrieve data.
They call tools.
They iterate.
They branch.
They retry.
They execute entire chains of logic in the background.
And much of that work produces no tokens at all.
No visible output.
No billable unit.
No trace in the system you’re using to measure cost.
| Workflow A: Simple prompt/response | Workflow B: Agentic workflow |
|---|---|
|
1. User prompt: “Summarize this document” Tokens generated: 25 Compute: single inference pass |
1. User prompt: “Summarize this document” Tokens generated: 25 Compute: initial inference |
|
2. Model produces summary Tokens generated: 200 Compute: single inference pass |
2. Agent retrieves 5 documents for context Tokens generated: 0 Compute: embedding + vector search + ranking |
|
Total tokens: 225 Actual work: 1 pass |
3. Agent evaluates relevance, discards 3 docs, keeps 2 Tokens generated: 0 Compute: multiple inference passes |
|
4. Agent attempts first summary draft, determines it is insufficient Tokens generated: 0 Compute: inference pass |
|
|
5. Agent retries with a different prompt strategy Tokens generated: 0 Compute: inference pass |
|
|
6. Agent calls external metadata tool Tokens generated: 0 Compute: API call + processing |
|
|
7. Agent produces final summary Tokens generated: 200 Compute: final inference pass |
|
|
Total tokens: 225 Actual work: 5-7 inference passes + retrieval + tool use |
|
| What billing sees vs what actually happened | |
|
Tokens billed: 225 Inference passes: 1 Retrieval ops: 0 Tool/API calls: 0 Actual compute: Low |
Tokens billed: 225 Inference passes: 5-7 Retrieval ops: multiple Tool/API calls: 1+ Actual compute: Significantly higher |


So now you have a system where:
- visible work is mis-measured
- invisible work is not measured at all
And you’re supposed to manage cost, pricing, and margin on top of that.
It’s like trying to run a power grid by counting lightbulbs instead of measuring electricity.
Or flying a plane through clouds with a windshield painted over — relying on instruments that don’t reflect reality.
The dashboards will look precise.
The numbers will update in real time.
The charts will feel authoritative.
But the underlying signal is corrupted.
Because tokens don’t map to cost.
So they can’t be used to control it.
Agentic AI didn’t break the token model.
It revealed what was already true:
The system was blind from the start.
– Published on Thursday, March 26, 2026