Agentic AI renders tokens obsolete

Agentic AI renders tokens obsolete

This week, stories in The New York Times and The Wall Street Journal highlighted something that’s been quietly building inside companies: employees are deploying AI agents that generate massive volumes of tokens—and massive, unexpected costs.

The assumption behind most AI systems is simple: tokens approximate usage, and usage approximates cost.

That assumption is false. And recent 10-Qs show active and emerging market compression across 25 public cloud companies after AI deployments.

The problem

Most AI systems are billed using tokens.

Tokens are simple: they represent words written to the screen—input from the user and output from the model

So billing is based on: how much text is generated

That worked when AI was:

  • chat-based
  • linear
  • human-driven

What changed

AI is no longer just chat.

It’s now agents that:

  • plan
  • reason
  • call tools
  • execute multi-step workflows

And critically: they don’t need to write words to the screen to do work

The break due to agentic AI

  • Significant compute can happen without proportional text output
  • Multiple internal steps may produce very few tokens
  • Two tasks with similar tokens can require very different compute

The real risk

AI cost behaves differently than SaaS:

  • Cost scales with usage in real time
  • Usage is no longer bounded by people
  • Agents can run continuously in the background

Result: Cost shows up before it can be controlled

The evidence

What’s becoming clear from teams in production is that agent workflows are driving cost through iteration, retries, and tool use in ways tokens don’t fully capture.

So even with good visibility, the economic behavior shows up after the fact — not before.

What CFOs actually need

Not better dashboards. Not more optimization. Finance needs:

  • Predictability
  • Control before spend occurs
  • A billing unit that reflects cost

The new framework to support agentic AI

  1. Measure compute–not tokens
    Shift from surface-level proxies, such as tokens, to the underlying resource being consumed. What matters is not how much text is generated, but how much compute is actually required to produce it.
  2. Normalize into a standard unit
    Translate that compute into a consistent, comparable unit so it can be tracked, budgeted, and governed across models, workflows, and environments.
  3. Control before execution (G-PEP)
    Introduce a governed, pre-execution control layer—G-PEP—that evaluates every request before it runs and has the authority to approve, constrain, or block inference entirely unless pre-determined conditions are met.

Before any task executes, the system must:

  • Estimate required compute
  • Apply budget, policy, and governance limits
  • Approve, throttle, or reject execution

This is where real control happens—not after the fact, but at the moment of intent.

User-level governance
Control doesn’t sit only at the system level. The end organization defines policies that govern how individual employees, agents, and workflows consume compute—ensuring no single user, team, or autonomous process can run up unbounded cost.

The principle
If cost cannot be estimated before execution, it cannot be controlled.

What doesn’t work

Systems that rely on:

  • tokens
  • post-execution monitoring
  • optimization alone

Because they do not provide cost control.

Bottom line

AI is not SaaS. It behaves like a utility.

If pricing doesn’t track compute, cost cannot be controlled.

Tokens don’t map to cost—so they can’t be used to control it.

– Published on Sunday, March 22, 2026



Contact us

© 2025 BrassTacksDesign, LLC