Agentic AI

Pro Tips

LLMs

LLM Tools: Connecting AI agent to APIs in 1 minute

May 23, 2025

Jakub Vacek

Software engineer

The success of AI agents often relies on robust connections to the external real-world systems. But someone has to build these tool connections.

As an agent builder, you have roughly the following options:

Create the tools by yourself manually. Very time consuming, expensive, not scalable. Capable of great results if you know the target systems.
Vibecode the tools in Cursor. Feels productive but reliable tools still require the knowledge of the target systems that will ultimately slow you down.
Find the right MCP server. There are already thousands of MCP servers. Since they are often generated, the quality is below vibecoded tools. Finding a good MCP server is a needle-in-the-haystack problem. Expect burned time.
Find the integration platform. Buying the integration is often a good option, however not all platforms are created equal. You will spend money & time finding the right one.

Industry surveys show that around 75% of agent runs fail on basic business tasks, usually because the right tool doesn’t exist or can’t be trusted.

Reliable, on-demand generation of LLM-ready tools is still the missing layer.

Edgar: Autonomous AI Agent

To address this we have built a system we call Edgar. Edgar is an autonomous AI agent that can connect your AI agent to any external API. As an agent, Edgar specializes in creating tools that are optimized for use in LLMs. . A user feeds it an API source name or just description of an action like “Create new deal in HubSpot”. Edgar produces one or more tools that are ready to be consumed by your Custom GPT or an AI agent using our API, SDKs, or an MCP server.

→ You can try out the first beta version at superface

Multi-agent under the hood

Edgar is a multi-agent system that generates LLM tools. Any LLM tool in Superface can be broken down into three specific files:

Service Configuration File – Defines security requirements and base URLs for all tools using this service.
Tool Interface – Defines the input, output, and error interface.
Tool Code – Contains the actual integration code that calls the API and returns data matching the interface.

This separation enables Edgar to focus on specific parts of the problem and allows for the reusability of existing files.

Edgar itself consists of these agents:

Main Agent – Orchestrates the entire process. Handles job routing and database checks.
Service Configuration Agent – Searches for documentation online and builds a config file for the API.
New Action Agent – Drafts the interface and code for a new tool using live API data from Run Action Agent.
Run Action Agent – Executes the tool, captures HTTP responses, and proposes fixes if anything breaks.

Tools used by agents range from simple database writes to complex interactions with live APIs. Agents work in parallel where possible to minimize latency. The diagram below breaks the system into agents and tools.

This system design roughly translates to this user flow:

User enters service name/base url/action description
Edgar checks existing API service configurations in DB. If configuration is found it is used right away. If it is not found Edgar researches possible API services and asks user to deduplicate/select the correct service.
The system prepares service configuration file using web search tools and deterministic checks.
If user doesn't specify any action Edgar researches structure of API (entities, endpoints) and asks user to select one or more entities they want to work with
For each selected action Edgar runs a new action agent (in parallel).
New action agent checks existing actions and if no action is found it researches API documentation, writes interface of the action
Edgar provides summary of all new action runs

 Recent open‑source and academic work validates this decomposed, specialised‑agent design. This approach provides ultimate flexibility but increases the cost of running this kind of service and has a unique set of challenges and problems.

Challenges and lessons learned

Reliable tool generation sits at the intersection of documentation, model limits, architecture and real‑world execution:

Getting the right docs

Keeping API references fresh was harder than writing code. Our first obstacle, therefore, was simply finding the current docs for the exact service we were about to generate a tool for; mainstream web search engines are tuned for general‐purpose relevance, so they often surface blog posts or marketing pages instead of the API reference we need. Testing some of the commercial search APIs confirmed the issue. To solve it we built a simple retrieval system which was later replaced by the built‑in search tools many LLM providers expose; the model can issue targeted queries mid‑prompt and stream the results back into the generation chain, which in practice yields fresher docs with less boilerplate code and gives Edgar ability to search the web for specific issues during tool generation.

Living within the context window

GPT‑4‑class models can now consume hundreds of thousands of tokens, but pricing and latency make restraint essential. We architect each agent to operate under token limits aligning with community guidance on window budgeting.

Choosing the right tool abstraction

A 1:1 endpoint map is fine for SDKs or very strong (future) agents; current agents, however, need intent‑level actions(“create‑invoice”, “send‑email”). Edgar clusters endpoints if needed, analyses usage examples, and synthesises higher‑level functions—an approach echoed by recent research into LLM‑made tools

Tool quality 

A connector is only as useful as its contract, so every tool we generate ships with metadata that carries a concise natural‑language description—one or two sentences that explain the business intent (“create a draft invoice in Xero”) and list any up‑stream or peer tools it depends on (“requires get‑customer‑by‑id”); modern LLM runtimes rank candidate functions almost entirely on that text, so precision here directly drives call accuracy. Next, the schema’s input block is meticulously documented: each property has a human‑readable hint (“customerId — UUID returned by get‑customer‑by‑id”), allowed values, and examples. Same rules are highlighted in the OpenAI cookbook on tool calling. By baking these rich descriptors into every connector, we achieve two wins: agents can pick the right tool and human developers can debug or extend the tool without spelunking through generated code

Feedback loops

Edgar can call every generated tool with example data against live API and feed execution logs back into the authoring loop. This greatly increases success rate of generated tools but also introduces need for multiple API accounts (one for testing and one for production)

Agent & tools interfaces

Function calling works by having the LLM return structured JSON with inputs for the tool or agent. But as the number or complexity of parameters grows, generation slows down—and the risk of bad or redundant values goes up. If you already have certain inputs (like order_id from a previous step), don’t ask the model to re-output them. Just pass them directly in code. Same goes for tightly coupled steps: if two functions always run back-to-back (e.g. query_location() then mark_location()), consider merging the logic into a single call to simplify things for the model.

Generating security configuration

Security is one of the biggest challenges of the agentic AI domain. LLM tools generation faces this challenge from a different angle: how to generate security configuration files without writing security related code by AI, even for API providers without standardized (OAuth) authentication flow?
Superface deals with this problem by having a specific provider configuration file that covers common security schemes and a very general provider configuration that covers many (but still not all) edge-cases.

A lightweight agentic framework

After trials with popular OSS frameworks we opted for an in‑house very simple abstraction of just the essentials: state, tool, and an event bus. It lets us swap model providers and run concurrent tool generations without dependency lock‑in and hard to debug third party code.

Conclusion

LLMs are getting better at reasoning and planning, but they still fall apart when the tools they need aren't there—or don’t behave as expected. That’s the real bottleneck. Edgar proves that it’s possible to build agents that are autonomous and reliable in real-world situations. By combining retrieval, specialized agents, deterministic validation, and context-aware generation, we’ve shown that agents can be equipped to handle complex tasks and finish them successfully. If you're building serious AI products, tools aren’t just a backend detail—it’s the foundation that makes everything else work.

References

Superface Website https://superface.ai/
Agent reality https://superface.ai/blog/agent-reality-gap
Large Language Models as Tool Makers https://arxiv.org/abs/2305.17126
OpenAI function calling https://platform.openai.com/docs/guides/function-calling#best-practices-for-defining-functions

Do you want to up-skill your AI agents?

We make AI agents more capable, accurate and reliable.

Talk to us if you seek to:

shorten time-to-integration for any number of tools
increase value per agent interaction, with higher goal completion rate
build customer trust using higher reliability

Book a call

How to choose the tools for your AI agents? ›