You have 1 article left to read this month before you need to register a free LeadDev.com account.
Estimated reading time: 7 minutes
Key takeaways:
- MCP isn’t a shortcut – it’s production infrastructure. The same rigor as enterprise APIs applies.
- Design for agents, not developers: nested hierarchies and verbose responses break down when the caller is autonomous. MCP interfaces need to be high-intent, single-call, and self-correcting.
- Agent-driven traffic is fundamentally unpredictable.
Model Context Protocol (MCP) was introduced by Anthropic in November 2024, and it has undergone tremendous development since then. The rapid evolution of Large Language Model (LLM) based agents has pushed MCP from experimental curiosity to the center of production at a speed faster than most developers anticipated.
As of Q1 2026, many big tech companies who own enterprise Application Programming Interfaces (APIs), like Google and Microsoft, have launched official MCP servers.
Transitioning from enterprise API to MCP is more than just prototype swapping or shallow interface wrappers. MCP demands the same rigor as enterprise APIs as both are essentially production infrastructure. MCP developers should also take careful considerations of MCP interface design.
Your inbox, upgraded.
Receive weekly engineering insights to level up your leadership approach.
Fundamental interface differences between API and MCP
Enterprise APIs are designed for traffic from deterministic software systems. For MCP servers, the callers are autonomous agents that can decide which tool or MCP server to invoke. Hence, there are some major differences in interface design principles between API and MCP.
Let’s take this nested resource structure as an example: departments/{department_id}/teams/{team_id}/members/{member_id}.
This represents a common pattern for enterprise APIs. Developers would first need to implement the logic to get the resource IDs from different hierarchies, then use the IDs to get the final resource – member information associated with a specific member ID.
In this case, we first need to get the organization ID, mostly through a separate lookup call of the organization ID given the organization name. Then we repeatedly do the same thing until the last layer of the resource hierarchy.
This design pattern means multiple round trips just to get the member information associated with one member ID. This especially matters for AI agents, since each call would add more latency and increase the likelihood of exhausting the agents’ context window.
MCP developers should not follow the same resource-oriented design pattern. Instead, the MCP interface should be as high-intent and highly efficient as possible. In the example above, all {id} should be parameters, respectively, of a single MCP request. This way we reduce round trips to a single call. In the case of errors, the MCP response should not simply return an error code but a structured explanation of what failed, and if applicable, what a valid request should look like. This allows agents to efficiently self-correct and retry.
MCP developers should also conduct quality evaluations regarding the MCP interface design. Teams can run A/B testing of two different interfaces, collect the results, and analyze which one yields better and more accurate results. Assessing what works and what doesn’t should be fully grounded on data rather than pure human judgement.
Security and authentication
Authentication for MCP carries more granularities. When an agent calls an MCP on behalf of a user or a system, simply assigning a credential leads to the granularity loss of the agentic layer. There would be no way to differentiate calls of different agents or sub-agents in logging and monitoring.
A production-grade approach for MCP should follow the OAuth 2.0 delegation model. Instead of a broad service authentication, the agent should use a scoped token which reflects what the user has explicitly granted authorization for in that specific session (think of agent session similar to a web browser session).
Over the course of a session, the agent developer can grant agent A access to resource A only, and agent B access to resource B only. Once the session expires, the authentication expires. This prevents the agent from accessing systems and performing unauthenticated actions. This also leaves an audit trail required by enterprise production environments.
If something goes wrong, you can trace back to the exact session and authentication settings of that session by loading the log associated with a specific session ID. Generally speaking, authentications for MCP follow similar principles as access control setup for enterprise APIs, but with more granularity.
Aside from authentications, response sanitization is equally important, especially since LLMs are prone to hallucination due to large context. Excessive information in the MCP response could have a great impact on the context.
Things like IP address, excessive reasoning, or verbose system logging would not only create security exposure but also increase the likelihood of the agent incorporating less relevant information in its answer. The more context there is, the higher the likelihood that the LLM will hallucinate. Hence, if a field does not help the agent take the next step, it probably should not be in the MCP response. MCP response should be designed to enrich the agent’s reasoning context efficiently to continue the reasoning chain.
More like this
Latency and logging
Enterprise APIs usually have extensive requirements for Service Level Objective (SLO) – e.g. 50%, 90%, 99%, or 99.9% latency requirements. These SLO requirements should be carried over to MCP. Even though MCP servers’ traffic are coming from agents, MCP servers should keep the same latency promises as enterprise APIs.
Usually, MCP call latency should be less of a bottleneck compared to LLM calls. MCP calls can be in the range of tens to hundreds of milliseconds, whereas LLM calls can be in the tens of thousands of milliseconds range. If the latency of specific methods is too high, the agent developers who have issues with this should file feature requests to the MCP server owners, and the two parties should agree upon the latency requirements.
Monitoring and observability require similar precautions, while keeping in mind that additional monitoring and logging can be implemented to help agent developers tremendously. Traditional enterprise API logging simply logs requests, response, timestamp, auth, and monitors Query Per Second (QPS), latency, and throughput etc. In this setting, each call is treated as a standalone, encapsulated action. This is not the case for MCP.
In an agentic flow, an agent can be calling the same MCP more than once from the self reflection and iteration. Simply logging the same thing as enterprise APIs do for MCP will be neglecting the nature of the agentic loop.
Instead, consider capturing context that could help reconstruct the reasoning chain. For instance, by using scoped OAuth tokens, developers can group all calls originating from the same agent sessions together. As each session is tied to a unique and time-bound credential, the MCP server can use that authentication metadata to sequence tool calls and reasoning steps accurately. Analyzing such MCP server side logging can help provide MCP developers great insights into the usage of the MCPs.
Scalability and capacity planning
MCP requires scaling and capacity planning like enterprise APIs do. However, agent-driven traffic (for MCP) has fundamental differences from human-driven traffic (for API).
Agent-driven traffic can be bounded by rate limit and the latency of the LLM call prior to the MCP call. Most LLM API providers have different tiers which caps different rate limits. For example, a basic tier can allow 10 QPS to the LLM API with a 10,000 token limit, whereas an ultra tier caps it at 100 QPS with 100,000 tokens.
With that being said, an agent running under the ultra tier could send 10x or more calls to the MCP server compared to that of a regular tier over the same duration. It’s harder to estimate the traffic throughput and the computing resources needed for the MCP server in this matter.
There’s also the fan-out estimation for MCP servers’ capacity planning. One user prompt can turn into a dozen tool calls in seconds. This can create bursty and unpredictable loads that the infrastructure of the MCP server might not be designed to handle. Many cloud providers offer auto-scalibility capabilities; for example, AWS EC2 and DynamoDB both have auto-scaling. Auto-scaling means that the infrastructure can acquire more computing resources and release idle/unused resources as applicable.
However, it’s at the MCP developer’s discretion to carefully choose the infrastructure and configure its scaling behaviors.
Regular load testing to stress test the server’s behavior under big traffic won’t correctly mimic this agent-driven traffic, since testing gradually increases the load to the server under test. Instead, teams should carefully conduct abruptly increasing loads to stress test the MCP server before moving to production.

New York • September 15-16, 2026
Full LDX3 lineup is here 🙌
MCP in the era of APIs and AI agents
MCPs are not simply enterprise APIs wrappers. Many of the same engineering principles still apply and should be carried over: security, scalability, latency, interface design, logging, and monitoring.
MCP should be built with the same production rigor as any enterprise service. The difference is that MCP servers are serving traffic coming from AI agents rather than deterministic programs, which leads to careful discretion in relation to aspects mentioned above.