Agentic AI

Why AI Agents Struggle in Real-World CRM Tasks—and What We Can Do About It

Apr 23, 2025

Radek Novotny

CEO

AI agents that are powered by large language models (LLMs) have the potential to transform the way businesses interact with software.

However, a recent Salesforce-supported study indicates a more pessimistic outlook. The study suggests that, without significant enhancements, these AI agents may not achieve the necessary levels of precision for effective use in enterprise environments.

This article provides a comprehensive analysis of the CRMArena study's findings, identifies the factors hindering the advancement of AI agents, and presents a pragmatic approach to address these challenges, ensuring the availability of accurate and reliable agents in the current market.

The Test: How AI Agents Perform on Real CRM Tasks

The Salesforce AI Research team introduced CRMArena, a benchmark for evaluating LLM agents in realistic CRM workflows. It includes nine tasks, validated by domain experts, mapped to roles like Service Manager and Analyst. Tasks range from lead creation to escalations and multi-step case handling.

Key Finding = accuracy below 40%:

Even with function calling and prompting, top agents completed fewer than 40% of tasks successfully.

That’s not just low performance. That’s non-functional at enterprise scale!

Why Agents Fail: It’s Not the LLM’s Fault

AI agents don’t struggle because they're "dumb." They struggle because they lack:

1. Domain Intelligence

General-purpose LLMs don’t know how your CRM works. They don’t know what “associating a contact with an opportunity” really means in your context.

2. Tool Awareness

APIs weren't built for agents. Agents need intelligent tools that abstract complexity and guide actions reliably.

3. Execution Memory

Most real-world tasks span multiple steps. Today’s agents lose track, misinterpret state, or skip important rules.

These challenges make accurate execution nearly impossible—unless something changes.

🚨 The 80% Barrier

There’s a pattern emerging in the industry: once an agent crosses 80% accuracy, companies start seeing real ROI.

But getting to 80%+ requires weeks of custom development, integrations, testing, and iteration—each use case becomes a mini software project.

In most organizations, this is unsustainable at scale. We’re building agents that require as much care and feeding as old-school automation.

What’s the Fix?

Agents need embedded intelligence, not just language skills.

That’s where Specialist Agents come in—a new layer in the agentic stack.

Superface builds these intelligent agents with:

Knowledge of your system (CRM, project tool, etc.)
Understanding of your workflows
Built-in skills and guardrails
Rapid trainability and testability

Result: an agent that works right away—not weeks later.

From Failing to Functioning

Let’s see this in action.

Example: HubSpot CRM Task Completion

Don’t Let AI Fail Quietly

The CRMArena study is a warning: AI agents won’t become reliable by magic. They won’t "just get better" with another prompt, another model update, or another plug-in.

But if we rethink what agents are—and give them the structure and tools they need—they can finally do what they promise.

And when they work, they really work.

Start with - Identify failing scenarios.

Typically, failures have occurred when the task required precise adherence to business rules or complex coordination of multiple steps across CRM objects.