Estimated reading time: 10 minutes

Key Takeaways:

  • AI agents are no longer just a topic for keynotes and boardroom decks. At enterprises, they now run at scale in production and deliver measurable results on core processes.
  • 54% of organizations now deploy agents in core operations, and 80% of those deployments report measurable ROI with an average return of 171% and payback periods of seven to nine months.
  • The winners are not the parties with the best model, but with the best harness: targeted tool access, clean context, and a human as the final check on sensitive decisions.
  • Klarna and JPMorgan show that even the largest organizations can automate work in the order of hundreds to thousands of FTEs with AI agents, provided they build their fallback paths before they scale up.
  • In every successful case, the human remains indispensable as the final check, escalation point, and ultimate accountable party. But with the right approach you already get remarkably far: the combination of agent and human structurally outperforms either on its own.
  • For SMEs, this is the signal to stop running endless pilots and put one well-defined process truly into production. The barrier no longer lies in the technology, but in the execution.

Table of Contents

  1. Key Takeaways
  2. Table of Contents
  3. Introduction: anyone can talk, deploying is something else
  4. Deep case 1: Klarna and the rebalanced equation
  5. Deep case 2: JPMorgan, 450 use cases in production
  6. Four shorter examples from the field
  7. What distinguishes the enterprises making it work
  8. What SMEs can do with this tomorrow
  9. Closing thoughts
  10. Frequently Asked Questions
  11. Sources

Key Takeaways

  • AI agents are no longer just a topic for keynotes and boardroom decks. At enterprises, they now run at scale in production and deliver measurable results on core processes.

  • 54% of organizations now deploy agents in core operations, and 80% of those deployments report measurable ROI with an average return of 171% and payback periods of seven to nine months.

  • The winners are not the parties with the best model, but with the best harness: targeted tool access, clean context, and a human as the final check on sensitive decisions.

  • Klarna and JPMorgan show that even the largest organizations can automate work in the order of hundreds to thousands of FTEs with AI agents, provided they build their fallback paths before they scale up.

  • In every successful case, the human remains indispensable as the final check, escalation point, and ultimate accountable party. But with the right approach you already get remarkably far: the combination of agent and human structurally outperforms either on its own.

  • For SMEs, this is the signal to stop running endless pilots and put one well-defined process truly into production. The barrier no longer lies in the technology, but in the execution.

Table of Contents

  • Introduction: anyone can talk, deploying is something else

  • Deep case 1: Klarna and the rebalanced equation

  • Deep case 2: JPMorgan, 450 use cases in production

  • Four shorter examples from the field

  • What distinguishes the enterprises making it work

  • What SMEs can do with this tomorrow

  • Closing thoughts

  • Frequently Asked Questions

Introduction: anyone can talk, deploying is something else

At every conference the theme hangs in the air. Every boardroom update contains a slide about "agentic AI". Every consultant now has a framework. But behind the slideshows a second group has emerged, one that is conspicuously quiet: the enterprises that have stopped discussing agents and just keep them running.

The Enterprise AI Agents 2026 report from G2 estimates that 54% of organizations are currently deploying AI agents in core operations. 80% of those deployments report measurable ROI, with a median 40% reduction in cost per unit on mature workflows (G2, 2026). Those are no longer pilot numbers. Those are production numbers from companies that have stopped talking and started building.

In this article: two examples in detail and four in brief, plus the pattern they share and what SMEs can do with it this week.

Deep case 1: Klarna and the rebalanced equation

The most cited case is still Klarna. In Q3 2025, the Swedish payment service reported that its AI agent did the work of 853 full-time employees and saved the company $60 million in costs. Customer service response times dropped from fifteen minutes to under two minutes, and the cost per transaction fell to roughly $0.19, a 40% reduction in two years (Customer Experience Dive, OpenAI case study).

What makes the story interesting for the broader field is the second half. Klarna started hiring people back for customer service last year. Not because the agent was performing worse, but because some conversation types and escalations contained issues that no prompt and no retrieval could change. Klarna now publicly calls it an "AI-first, human-hybrid" model: the agent still handles the lion's share, but every customer can speak to a human at any moment (CX Dive on the recalibration).

The lesson for anyone looking at these figures is twofold. On the one hand: the savings are real and lasting. On the other: the organizations that are furthest along today first tried to automate everything and then rebuilt where needed. Anyone who skips that phase and aims directly for the "definitive" mix learns more slowly where their agent actually ends.

Deep case 2: JPMorgan, 450 use cases in production

Deep case 2: JPMorgan, 450 use cases in production

At the other end of the spectrum sits JPMorgan Chase. Earlier this year, more than 450 AI use cases were running in production, with planning headed toward 1,000 by year end. The internal LLM Suite is used daily by roughly 230,000 employees. Management reports $1.5 to $2 billion in annual cost savings, plus separate contributions such as $220 million from AI-driven credit card upgrades and $100 million from AI-generated recommendations in the corporate bank (Artificial Intelligence News, CNBC).

Particularly notable is fraud detection: an agentic pipeline reduced false positives in anti-money-laundering alerts by 95%. For a bank, that means thousands of investigations per week that no longer need to happen, and compliance staff who can shift their attention to the real signals.

What stands out here is that JPMorgan does not pick one model or one vendor, but builds a platform on which use cases run as modules. The CIO manages a $19.8 billion budget and 65,000 technologists to maintain that infrastructure. For SMEs that budget is irrelevant, but the pattern is not: the bank first builds the generic layer (tool access, audit, observability) and only afterwards the specific agents.

Four shorter examples from the field

Moderna. The biotech company rolled out ChatGPT Enterprise to roughly 3,000 employees. Within two months, 750 internal GPTs had been built by the employees themselves, every user averaged 120 conversations per week, and the legal team hit 100% adoption (Constellation Research, OpenAI case study). Interesting detail: the legal team, typically the last bastion of manual work, was the first function with full adoption.

Salesforce as its own customer. In the first year that Salesforce used its own Agentforce product internally, the service agent handled more than 1.5 million support requests, the vast majority without human intervention. The internal SDR agent processed 43,000 leads and generated $1.7 million in new pipeline (Salesforce News).

reMarkable. The Norwegian e-paper manufacturer put its first AI agent into production within three weeks. "Mark" is a knowledge base FAQ agent that has now handled more than 18,000 service conversations, with NPS and deflection numbers improving weekly (Salesforce customer stories). Three weeks from first idea to production is an important number: it is the marker above which many SME pilots fall apart due to scope creep.

Macquarie Bank. The Australian bank reports via Google's Gemini Enterprise a 38% higher self-service ratio, 40% fewer false positives in alerts, a 24% lower headcount in the retail arm, and at the same time 50% growth in mortgage volume (G2 report). The same combination as at JPMorgan: costs down and capacity for growth up.

What distinguishes the enterprises making it work

What distinguishes the enterprises making it work

When you place these cases side by side, a pattern emerges that has remarkably little to do with model choice.

They start small and focused. Klarna started with one conversation class in customer service. Moderna started with legal review. Macquarie started with fraud alerts. None of these organizations started with an "AI strategy for the entire enterprise"; they started with one process painful enough to address.

They build escalation in before they scale up. In every successful case, a human is the exit route. Klarna now makes this explicit, JPMorgan has it built in via compliance officers, Salesforce keeps customer contact ultimately with account managers. The agents take work away, not responsibility.

Across every case, that is the most persistent pattern: the human remains indispensable. Not as a necessary evil, but as a deliberate design choice. At the same time, the same cases show that with the right approach you already get remarkably far: 80 to 90% of the volume can perfectly well be handled by agents on many processes, provided the remaining 10 to 20% lands carefully with a human. So it is not either/or, but a well-choreographed combination in which each plays its strongest role.

They measure cost per unit, not number of prompts. The most cited reports talk about cost per transaction, resolution time, false-positive ratio, and NPS, not about how often the model was called. That forces teams to design use cases so they are measurable in the first place.

They invest in the layer underneath. Tools, audit, observability, and context pipelines are treated as infrastructure at each of these organizations, not as a byproduct. Deloitte's State of AI in the Enterprise report consistently names this as the dividing line between organizations that scale up and organizations that stay stuck in pilot mode (Deloitte).

What SMEs can do with this tomorrow

For SMEs this is all good news, for two reasons. First: the hard questions have been answered by enterprises over the past few years. Which use cases work (customer service, classifying inbound questions, contract review, fraud detection, lead qualification). How you build in escalation. How you measure. Those patterns are now public.

Second: a small business does not need 450 use cases. One that works well, for one well-defined process, already produces a difference visible in an SME P&L. reMarkable's three-week implementation is a far more realistic template than JPMorgan's $19 billion transformation.

Concretely: pick one recurring process that costs a lot of time or frustration. First-line customer questions, quote drafting, invoice classification, internal HR FAQ. Build an agent around it with targeted tooling and clear escalation. Measure it for two months. Then decide whether to roll out to the next process or deepen this one first. That is the entire recipe.

Closing thoughts

Closing thoughts

AI agents are no longer a promise of what technology will someday be able to do. It is a collection of production numbers, with winners showing how to make the leap and with enough openly shared mistakes to chart your own course. The barrier no longer lies in the models, not in the tooling, not in the price. The barrier lies in the willingness to actually tackle one concrete process instead of continuing to talk about the potential.

That last step is exactly where many SMEs still get stuck. Not because they fall short, but because the hype has taught them to think big. The enterprises that are winning now thought small and executed consistently. That is the playbook every entrepreneur can do something with.

Frequently Asked Questions

Aren't these numbers mostly vendor marketing?

A portion inevitably comes from case studies by OpenAI, Salesforce, and Google. But independent sources (G2, Deloitte, financial business media) confirm the broader pattern: 80% of enterprise deployments report measurable ROI, and most figures are by now also traceable in annual reports.

Do agents replace employees?

In some roles, work shifts substantially. Klarna's customer service is the clearest example. At the same time, Macquarie and JPMorgan show that capacity often shifts toward growth work rather than disappearing. The Klarna recalibration also shows that full replacement is usually not the end state.

What is the smallest sensible starting point?

One process, one team, one agent with clear escalation to a human. reMarkable's three weeks to production is a good benchmark. If your first pilot takes three months without anything in production, that is almost never due to the technology.

How does this relate to the EU AI Act?

Most customer service and office agents fall in the minimal to limited risk category. However, transparency and logging obligations apply to all agents. A well-organized harness (context, tools, audit, escalation) automatically covers most of those requirements.

Sources