An open letter to Jeff Bezos: If you provision AI, you can cut compute costs up to 90%

Dear Jeff,

You’ve never been afraid to spend aggressively when the economics justify it.

But even a 19th-century railroad baron would have balked at the way AI compute is being provisioned today.

Amazon was built on a simple idea: anything that scales must get cheaper.

That principle turned retail margins into logistics mastery. It turned AWS into the backbone of the internet. It turned cost discipline into a competitive weapon.

AI breaks that rule.

At Amazon, AI doesn’t fail because of lack of adoption. It fails because the cost curve runs the wrong way. Inference costs rise with usage. Energy costs rise with scale. Capex rises before revenue. The more “intelligent” the system becomes, the more expensive it is to operate.

Alexa was the warning shot.
AI is the main event.

Today, intelligence behaves like an unpriced utility inside Amazon — everywhere, indispensable and economically invisible. That is not a sustainable equilibrium for a company built on first principles.

AI that gets more expensive as it scales violates Amazon’s core operating philosophy.
AI without accurate cost attribution turns optimization into guesswork.
AI systems that organizations cannot predict, constrain, or budget for are slow-walked, sandboxed or quietly shelved.

Which brings us to the three problems Amazon must solve before if can align with users and get costs under control.

1. Memory

LLMs survive their memory limits the same way JPEGs survived slow networks: through lossy compression.

JPEGs throw away pixels. LLMs throw away facts.

At first glance, the loss isn’t obvious. But look closely and the seams appear: blurred edges, missing detail, artifacts that weren’t visible at first. With LLMs, those artifacts are missing facts and broken continuity.

What JPEGs lose are pixels.
What LLMs lose is truth.

Without 100% loss-less memory, AI cannot be trusted. Without trust, there is no adoption. Without adoption there is no scale. And without scale, the market caps tied to AI infrastructure evaporate.

If you believe memory is a problem you can solve later, please know that a solution to this problem has been filed and is patent pending.

2. Governance

Enterprises will not adopt systems they cannot control. And users need agency as well:

Over how AI behaves, when it escalates, when it refuses and how it explains itself. They need visibility, constraint and the ability to govern outcomes rather than react to them after failure.

Right now, governance is implicit, opaque and centralized. That is tolerable for demos. It is unacceptable for real work.

Joni Mitchell never accepted an instrument as it was handed to her. She tuned it — again and again — until it matched the sound she heard in her heart. She custom-tuned her guitar for many of her songs, including “California.”

Governance in AI should work the same way: not as control imposed from above, but as user-level tuning that lets people shape how the system behaves, remembers and responds.

AI systems that do not give users control will be treated as toys, not tools.

If you believe governance is a problem you can solve later, please know that a solution to this problem has been filed and is patent pending.

3. Optimization

Amazon’s AI problem is not demand. It is unprovisioned compute.

Today, most AI systems operate the way cloud did in its earliest days:

A request arrives.
The system spins up resources.
The job runs.

Crucially, the same amount of compute headroom is provisioned for every request, regardless of complexity. To be sure that the query processes successfully, generous headroom is allocated to all jobs — simple and complex alike — even though the vast majority of requests never need it.

There is no meaningful pre-execution understanding of:

how expensive a request will be,
whether the compute is justified,
or whether a cheaper execution path would produce an acceptable result.

Capacity is allocated first.
Cost is discovered later.

That is the opposite of how Amazon built its empire.

In every other part of Amazon’s business, provisioning is sacred. Inventory is forecast. Warehouses are right-sized. Logistics are optimized before trucks roll. Capacity is allocated based on expected value. Waste is designed out before it happens.

AI breaks that discipline.

Most LLM workloads today are blindly provisioned:

The system does not estimate cost before execution
It does not right-size the model to the task
It does not gate execution by value
It does not offer cheaper alternatives up front
It does not enforce predictable spend envelopes

As a result, inference costs scale linearly — or worse — with usage. The more successful the AI system becomes, the more expensive it is to operate. That is an inversion of Amazon’s core operating philosophy.

Why no one provisions today

Provisioning doesn’t happen because current AI stacks were never designed for it.

Token-based pricing hides real cost.
Model selection is opaque.
Compute paths are unpredictable.
There is no standard way to estimate FLOPs before execution.
And there is no governance layer empowered to say “this job is not worth that much compute.”

So teams default to the safest option: run the biggest model, burn the compute and deal with the bill later.

That approach works in demos.
It fails at Amazon scale.

Why provisioning is the 10× lever

The majority of AI workloads do not require maximum intelligence.

Summaries, classifications, lookups, transformations and routine reasoning can often be done with:

smaller models
shorter context windows
cheaper hardware paths
or approximate results that are “good enough”

But without provisioning, every request is treated as mission-critical.

Pre-execution provisioning — estimating cost, matching the task to the cheapest acceptable model and gating execution accordingly — can reduce total AI compute spend dramatically. In many environments, cost reductions of up to 90% are achievable – not through better hardware, but through better decisions made before the job runs.

This is not a research problem.
It is a systems problem.
And it is exactly the kind Amazon has solved before.

And if you believe optimization based on compute allocation is something you can defer, please know this:

It’s not impossible. It’s patent pending.

Jeff, Amazon has solved this problem before.

It solved it in retail by refusing to ship inventory blindly.
It solved it in logistics by sizing capacity before trucks rolled.
It solved it in AWS by making compute predictable, measurable and governable.

AI should be no different.

Intelligence that cannot be provisioned, constrained or optimized in advance will never scale safely inside Amazon. It will be capped, sandboxed and quietly limited — not because it lacks promise, but because it violates the discipline that made Amazon work.

The path forward is not more intelligence.
It is intelligence that behaves like a system Amazon can run.

Solve memory, governance and optimization together — and AI stops being an unbounded expense and becomes another engine of leverage. Ignore them, and AI becomes the rare thing Amazon has always rejected: a cost that grows faster than control.

I know which side of that trade you prefer.