Taking AI tool use to human level

Mar 12, 2025

Radek Kysely

Lead AI Developer

Build AI agents that can use external tools with a human level accuracy.

Ever since OpenAI pioneered function calling, developers and agent builders—including us—are spending countless hours extending their agents’ capabilities with third-party systems via tools.

And while the entire industry is investing heavily in developing LLMs that beat humans in reasoning tasks, their performance when using tools still often remains far below human efficiency.

Skill gap

The offering of tools we’ve seen on the market so far has been clearly focused on delivering the different abstractions over the APIs. The most popular ones can be split into 2 categories:

API Connectors

1st generation

These tools directly mirror the API they connect to. The advantage is that you can generate these tools from API documentation (or OpenAPI). However the LLMs have a difficulty reasoning about the tools use, and don’t know how to interpret the tool results.

Examples: Composio, Arcade, Anon

Semantic Tools

2nd generation

Semantic tools build on API tools and are optimised to be used with LLMs. This usually includes adding descriptions of the tool semantics, working with a state, or removing ambiguity from the data models. For some APIs, this can mean rewriting the API interface completely.

Examples: Anthropic’s MCP, Superface’s intelligent tools

When developing AI tools based on APIs, we inevitably encounter inherent limitations of APIs themselves, particularly their lack of guidance regarding proper usage in various use cases (i.e. having a look at the spreadsheet format before appending new row into it).

Without genuine understanding of underlying systems and business processes, AI agents are left with a significant skill gap. They struggle with planning the tool use and usually fail when encountering more complex or slightly ambiguous scenarios.

The tools for successful agents need be AI-first. They should communicate in their natural way, understand the user’s natural intent and learn their preferences.

Today we’re introducing Specialists—the next generation of AI tools.

Introducing Specialists

Specialists are tool-specific agents that plan out and perform tasks on behalf your agent’s users. Thanks to their own reasoning model that combines the tool expertise, domain knowledge and user preferences, the Specialists can operate at a level similar to that of a human.

Early testing shows that Specialists can effectively complete both simple & complex tasks involving multiple cross-dependant steps, dramatically improving outcomes compared to previous generations of AI tools.

Although Specialists were initially made with complex tools in minds, we see significant improvements in completeness and reliability of task accomplishment in simpler tools, like the calendar or to-do list management.

How Specialist tool works

Specialists differ fundamentally from traditional tool repositories. Instead of turning software services into disconnected lists of functions, Specialists transform them into self-contained AI agents that are experts on using that specific service.

When your agent decides to use a tool, it passes the request using its own words, context and expectations. The Specialist then makes sure that all work is being completed correctly, and responds with the results.

Specialists run their own LLM reasoning and achieve their results by combining:

Operational knowledge: Deep understanding of the tool’s capabilities and functionalities
Domain expertise: Familiarity with the specific use cases and scenarios relevant to the tool
Loaded context: Stateful settings or preferences loaded from the external system to operate correctly
Agent preferences: Instructions from the client agent builder to steer the tool use (e.g. to comply with organization’s requirements)
Memory: User and agent preferences learnt from the interactions and feedback
Connectors with managed authentication: Adapters for interacting with the target system (APIs, browsers, SQL databases, Anthropic MCP, etc.)
Guiderails & access control: Built-in mechanisms to enforce policies, limit actions based on user permissions, and ensure AI agents operate safely within defined boundaries. Available later

Learn more about how Specialists compare to basic tools in our HubSpot agent demo made in CrewAI. Available on GitHub: superfaceai/hubspot-agent-demo

Follow us on X (Twitter) to stay updated on in-depth articles about the architecture, the communication protocol, and overall benefits & limits analyses.

Try out Specialists today

Today, we’re releasing the preview of selected Specialists for HubSpot, Todoist, Google Calendar, and Google Sheets. We’ll be rolling out more Specialists over time.

They are free to use and available to everyone.

Specialists currently support connection to custom GPTs. Follow us on GitHub to get notified once the API and SDKs for using Specialists from agentic frameworks are be released.

Do you want to up-skill your AI agents?

We make AI agents more capable, accurate and reliable.

Talk to us if you seek to:

shorten time-to-integration for any number of tools
increase value per agent interaction, with higher goal completion rate
build customer trust using higher reliability

Book a call

‹ Why AI Agents Struggle in Real-World CRM Tasks—and What We Can Do About It

Connect Claude to APIs ›