LLMs can’t scale their way to profitability

By Alan Jacobson, Systems Architect & Analyst

Costco sells millions of rotisserie chickens every year — at a loss. Scale doesn’t recover the cost – and it was never meant to – because the chicken is a loss leader subsidized by everything else in the cart.

But LLMs can’t sell inference at a loss, because they have nothing else to sell to recover the cost.

There are only two ways the LLM industry escapes its current math:

Either demand scales fast enough to justify the cost, or
Unit economics improve fast enough to support the demand.

But both keys must turn, and neither is turning.

The simple constraint everyone skips

For most SaaS providers, scale leads to profitability because each additional user doesn’t add as much cost as the initial user. The software is already built, the infrastructure is shared and the incremental work per click is small. As usage grows, costs per user fall.

LLMs don’t work that way.

Every LLM request burns a lot of compute – that’s why LLMs need so many datacenters and most SaaS providers do not.

And the compute that LLMs require happens fresh, every time, at the moment you ask the question – the tasks don’t get smaller with scale.

So when usage goes up, costs go up with it.

Scale doesn’t make LLMs cheaper.
It makes the bill bigger.

That single fact breaks the idea that LLMs can “grow their way” to profitability.

Key one: demand can never scale enough

AI adoption is not accelerating. It has stalled for Q2, Q3 and Q4 of 2025.

But even if adoption did surge, it wouldn’t fix the problem — because higher usage doesn’t lower costs.

This is the part almost no one says out loud.

Every new user adds cost.
Every new query adds cost.
Every new “improvement” adds even more cost.
Better models cost more to run.
Agents cost more than chat.
Personalization costs more than generic answers.

So demand can never outrun cost. It can only chase it.

When people say “scale will fix the economics,” they’re assuming a cost curve that simply isn’t there.

It never was.

That’s key one.

Why demand stalls anyway

Adoption also stalls for a more basic reason: trust or more specifically, the lack thereof.

Systems that don’t remember reliably can’t be trusted.
Systems users can’t control can’t be integrated into real work.
Systems that hallucinate or forget force humans back into checking and correcting.

That kills adoption.

An assistant you have to babysit isn’t an assistant.
An agent that forgets context isn’t an agent.
And a system you can’t customize to your own rules will never become infrastructure.

This is not hypothetical:

This is why Google’s addition of Gemini buttons to Chrome won’t fix anything; It will only hasten abandonment at scale. More users will try, then walk away disappointed – the same way hallucinations chased people away from Gemini 1.0 and 1.5. BTW, Gemini 3’s hallucination rate is 88%.
This is why “agents” don’t magically unlock demand. This is why enterprise pilots stall instead of expand. Just ask Salesforce.

Without 100% loss-less memory and user-controlled governance, adoption plateaus — no matter how good the model gets.

Key two: unit economics get worse with scale, not better

Even if demand were infinite, the math still doesn’t close.

Flat-rate pricing will never work, because no two users use alike.
Token-based pricing is not a real economic model. It’s a billing shortcut.

Tokens measure how much text moves around, not how much work the system actually does. They ignore reasoning depth, time spent thinking, retrieval, orchestration, memory handling, tool use and verification.

As LLM systems get more capable, they do more work per request.
As they do more work, costs rise.
As costs rise, margins shrink.

That’s the opposite of how people think scale is supposed to work.

Scale doesn’t make each unit cheaper when each unit gets heavier.

It just makes the losses larger.

That’s why inference costs rise with “better” models.
That’s why agents are more expensive than chat.
That’s why memory and governance increase compute.
And that’s why ads inside LLMs aren’t monetization — they’re a desperate attempt to subsidize a model that can’t support itself.

Even perfect adoption wouldn’t save token economics.

More usage just accelerates the burn.

That’s key two.

Why “scale will fix it” no longer holds

The industry is betting that adoption will re-ignite, costs will flatten and revenue will catch up.

But those bets contradict each other.

If adoption grows, costs rise.
If costs rise, margins collapse.
If margins collapse, ads appear.
If ads appear, trust erodes.
If trust erodes, adoption stalls again.

That’s not a flywheel.
That’s a loop.

And loops don’t compound. They exhaust.

The constraint, not the prediction

This isn’t a forecast.
It’s a constraint.

To escape it, the industry must do these four things:

1. Restore trust with 100% loss-less memory

2. Provide users with personalization and agency via user-controlled governance

3. Replace token pricing with billing that reflects the actual work being done

4. Deploy pre-execution provisioning to reduce compute costs up to 90%

Without all four, LLMs can’t scale profitably.
With only one, it still fails.

That’s the lock. Items 1-4, above, are the keys.

Why markets will care

Markets can tolerate hype.
They can tolerate losses.
They can tolerate long timelines.

What they can’t tolerate is a business model that gets worse as it grows.

Once that becomes impossible to ignore — through earnings, guidance, capex disclosures or concentration risk — repricing isn’t optional. It’s mechanical.

The only real question left is not if, but who admits it first.