It’s called a demo for a reason: It’s not how work gets done in the real world
AI is not ready for business. That isn’t an opinion. It’s what the data, the vendors and real-world trials now show:
- MIT researchers report that roughly 95 percent of generative AI pilots inside companies fail to move beyond experimentation.
- A recent McKinsey survey found that nearly 80 percent of companies using AI saw no meaningful impact on their bottom line.
- Even Microsoft CEO Satya Nadella has publicly acknowledged that Copilot integrations “for the most part don’t really work,” a problem so persistent that Microsoft has resorted to paying third parties to train customers who already bought the product but aren’t using it.
These are not edge cases. They are signals.
Why AI keeps failing at work
The problem is not talent, effort or imagination. It’s design.
Modern AI systems are probabilistic. They are extremely good at guessing what word, image or action is most likely to come next based on patterns learned from vast amounts of prior data. That makes them fluent. It does not make them reliable.
Work systems, by contrast, are deterministic. When a human specifies a task in a workplace, the expectation is not a good guess. It is correct execution. Deadlines, money, permissions, compliance and accountability all depend on systems doing exactly what was specified, not what seems statistically likely based on how others behaved in the past.
This is why AI excels at drafting emails, summarizing documents and brainstorming ideas, and why it struggles the moment it is asked to reliably handle processes, files, money or time. Fluency feels like intelligence, but fluency is not control. Guessing is not execution.
What researchers are finding
The MIT and McKinsey numbers point to the same conclusion from different angles. Companies are experimenting heavily with AI, but those experiments are not translating into durable operational gains. Pilots stall. Tools get sidelined. Costs accrue without corresponding returns.
That gap matters because the current bet behind Agentic AI is that companies will trust these systems with real authority: to move data, initiate actions, spend money and manage employee time. The evidence suggests that trust may be a long time coming.
What happens when a capable user actually tries
A recent report from The Information put this to the test with Anthropic’s Cowork, an AI agent designed to automate workplace tasks. The assignment was modest: create a simple “word of the day” bot.
The reporter began with a 352-word specification. That detail matters. Most office workers would never write one. Even with that level of clarity, the process took roughly three hours and required repeated debugging using Terminal – an application a typical user would be afraid to use.
When the app finally ran, the results were telling:
- Definitions were missing or incorrect.
- Requested example sentences never appeared.
- Outputs were incomplete.
The reporter’s conclusion was blunt: “It’s hard to imagine Cowork taking off with a general audience until it works more smoothly and offers a gentler learning curve.”
This was not a novice user struggling. It was a technically savvy reporter following instructions carefully. If this is what success looks like, scale becomes difficult to imagine.
Why even success creates new risks
WIRED’s reporting highlights an even deeper problem. Agentic systems like Cowork introduce persistent security risks that are not theoretical. These tools are susceptible to prompt injection attacks, hidden instructions embedded in content that can redirect an agent’s behavior without the user realizing it.
Anthropic itself warns users to be cautious. On its support page, the company notes that since Claude can read, write and permanently delete files, users should avoid granting access to sensitive information such as financial documents, credentials or personal records. It recommends creating backups and limiting agents to folders containing only nonsensitive data.
That advice is revealing:
A system marketed as a workplace assistant comes with instructions not to trust it with the very materials that define work.
Wired’s reporter was able to get Cowork to sort files into folders by date. That is a useful trick. It is also something a user could do on the original Macintosh in 1984.
The long wait ahead
If enterprise AI adoption depended only on novelty or hype, the problem would already be solved. What’s holding it back is something more fundamental.
Agentic AI promises autonomy, but delivers unpredictability. It promises efficiency, but requires expertise. It promises trust, but asks users to limit exposure, keep backups and avoid sensitive data.
Until AI systems are designed for deterministic execution rather than probabilistic guessing, demos will continue to look impressive while real work quietly resists them. And companies waiting for AI agents they can trust with data, money and time may find themselves waiting much longer than the marketing suggests.
– Published Tuesday, February 3, 2026