Intelligent AI Routing

The discovery problem in a sea of specialists

Imagine you need to renew your subscription. You open your company's agent directory and see hundreds of AI agents staring back at you: billing-support, sales-assistant, delivery-support, tech-assistant, HR-assistant... Which one handles subscription renewals?

You click on billing-support. After a few exchanges, you realize this agent handles invoices and refunds, not renewals. You try sales-assistant. Ten minutes later, you're still bouncing between agents, hoping to find the right specialist for your seemingly simple question.

This is the reality of multi-agent ecosystems today. At GoDaddy, we're rapidly expanding our AI agent fleet - adding 1 to 10 new specialized agents every week. These agents serve both our customers and internal employees, each expertly handling specific domains like domain transfers, SSL certificates, hosting configurations, and billing inquiries. But as our agent ecosystem grew, we faced a critical challenge: discovery.

Users shouldn't need to become experts in our agent architecture just to get help. They need what every organization has in the physical world - a front desk that understands their question and routes them to the right specialist.

The challenge of automatic routing at scale

AWS Strands SDK provides excellent Agent-to-Agent (A2A) communication capabilities through its interface A2AClientToolProvider. A2A is a new standard communication protocol that enables agents to communicate with each other just as humans do - with natural language understanding and context preservation. The Strands SDK implements this protocol, allowing you to give it a list of agent URLs and automatically route queries to appropriate agents.

However, in our production environment, we encountered limitations that made the out-of-the-box routing unsuitable for our scale:

Performance bottleneck: The SDK scans all sub-agents for every query, making HTTP calls to read each agent's capabilities before selecting one. With hundreds of agents in our Agent Name Service (ANS - GoDaddy's internal and external agent registry for agent discovery and invocation), this took two to three seconds per routing decision. Worse, subsequent messages in the same conversation still triggered full scans.
Black box decisions: When routing selected an agent, we couldn't understand why - a problem for debugging misroutes and maintaining audit trails in enterprise deployments.
Limited customization: Business rules and custom routing logic were difficult to implement without modifying the core SDK.
No conversation context: The SDK treated each message independently, with no awareness that a user was mid-conversation with a specific agent.

An intelligent routing solution built on proven infrastructure

Rather than abandoning Strands SDK, we built an intelligent routing layer on top of it. This approach lets us leverage the SDK's robust A2A protocol implementation while adding the control and performance we needed. We call it the Platform Routing Agent.

Platform routing agent architecture

The following diagram illustrates the architecture for our Platform Routing Agent:

User API

➔

Routing Decision

Agent Selector (GPT-4o)

Agent Evaluator (Claude Sonnet)

➔

A2A Handoff
(Strands SDK)

↓

Selected Agent
(via A2A)

Our architecture maintains clean separation between concerns:

Selection Layer: LLM-based intelligent routing decisions
Communication Layer: Strands SDK handles all A2A protocol details
Caching Layer: Conversation-aware performance optimization

How dual-agent routing works

The following sections describe how our Platform Routing Agent works.

1. In-memory agent registry

At startup, we fetch all agent metadata from ANS and load it into memory. Each agent provides an "agent card" - essentially a business card in JSON format containing:

Agent name and description
List of skills and capabilities
Specialization areas
Communication endpoints

This eliminates runtime lookups. With agent data already in memory, routing decisions complete in under 500ms (compared to two or three seconds with repeated HTTP calls).

2. Dual-LLM scoring system

Instead of relying on black-box automatic routing, we use two independent LLM agents working together:
Agent Selector (GPT-4o): Analyzes the user's query against all available agents. It receives each agent's name and description, then scores every agent on a scale from 0 to 1 based on how well they match the query.
For example, when a user asks: "I need help transferring my domain to another registrar"
Selector scores:

domain-transfer-agent: 0.92
  Reasoning: "Query explicitly mentions domain transfer, which is this agent's primary function"

billing-support: 0.35
  Reasoning: "Domain transfers may involve billing aspects, but this is not the primary concern"

hosting-support: 0.15
  Reasoning: "Hosting is unrelated to domain transfer operations"

Agent Evaluator (Claude Sonnet 4.5): When the top two candidates have close scores, the evaluator provides independent judgment. It receives the full agent cards - including detailed capabilities and skills - and acts as a tie-breaker. We intentionally use a different, more powerful LLM for the evaluator to provide diverse perspective and higher-quality judgment calls.

This dual-agent approach has shown 90%+ agreement between selector and evaluator, with clear tie-breaking for the remaining ambiguous queries. We've measured a 15% reduction in misrouted queries compared to our previous approach.

3. Transparent decision making

Every routing decision includes complete reasoning:

{
  "selected_agent_id": "domain-transfer-agent",
  "confidence": 0.92,
  "reasoning": "Query explicitly mentions domain transfer operations",
  "routing_scores": [
    {"agent_id": "domain-transfer-agent", "score": 0.92},
    {"agent_id": "billing-support", "score": 0.35},
    {"agent_id": "hosting-support", "score": 0.15}
  ],
  "used_judge": false
}

This transparency is invaluable for debugging, compliance, optimization, and trust. It provides us with an understading of why queries were routed incorrectly, maintains an audit trail of automated decisions, identifies patterns in routing behavior, and visibility for users and operators to see the reasoning behind decisions.

4. Conversation caching

After an agent is selected for a conversation, we cache that decision in memory. Subsequent messages from the same user in the same session automatically route to the previously selected agent - no LLM scoring required.

Conversation caching provides high performance (cache hits complete in under 50ms) and context preservation (agents maintain conversation continuity). Additionally, it offers user isolation (each user's conversation state is independently tracked) and cost efficiency (fewer LLM API calls for ongoing conversations).

5. Full streaming support

We implement complete Server-Sent Events support, enabling real-time response streaming from agents. Users see responses appear progressively rather than waiting for complete answers, significantly improving perceived performance for longer responses.

What we learned in production

The Platform Routing Agent has been running in GoDaddy's production environment for a couple of months, handling routing decisions across our growing agent ecosystem.

Beyond the latency improvements from in-memory caching and conversation context (discussed earlier), we've focused on measuring routing quality. We track misrouted queries by logging cases where users explicitly request a different agent or abandon conversations immediately after routing. Comparing this against our previous SDK-only approach, we've seen a 15% reduction in misrouting.

The dual-LLM architecture has also proven its value. When we compare the selector's top choice against the evaluator's independent judgment, they agree over 90% of the time. For the remaining 10% of ambiguous queries, having a second opinion from a different model provides meaningful tie-breaking rather than arbitrary selection.

As our agent registry grows, the in-memory architecture scales efficiently - agent metadata is lightweight, making it straightforward to handle hundreds or thousands of specialized agents without performance degradation.

Why this approach?

You might wonder: why use LLMs for routing at all? Why not simpler alternatives? After all, rule-based routing, fine-tuned models, and out-of-the-box solutions like AWS Bedrock Agents all exist. The answer is that each alternative had limitations that didn't fit our production requirements.

Rule-based routing breaks down when queries are ambiguous. Consider: "My site is down" could mean hosting, DNS, or SSL certificate issues. "I can't renew" might involve billing, domain, or subscription problems. "Transfer help needed" could refer to domain transfers, account transfers, or content migration. Keyword matching and rigid rules lack the semantic understanding to handle these nuances - they can't reason about context, synonyms, or user intent the way LLMs can.

Fine-tuning a smaller model seemed appealing for cost reasons, but it requires significant training data collection, ongoing retraining as agents are added or modified, and specialized ML infrastructure. Our agent ecosystem changes weekly. GoDaddy's internal LLM provider, with unified access to multiple models, governance, and cost controls, delivers intelligent routing immediately and adapts automatically to new agents without retraining.

AWS Bedrock Agents with knowledge bases solve a different problem entirely - they help individual agents access relevant information. Our challenge is routing between agents, not within them. The knowledge we need comes from agent cards registered in ANS, not external knowledge bases.

Despite its routing limitations, we kept Strands SDK because it provides immense value: correct A2A protocol implementation, battle-tested agent-to-agent communication, seamless AWS integration, and handling of protocol complexity we'd otherwise need to build ourselves. By layering intelligent routing on top of Strands SDK, we get smart decisions with reliable communication - the best of both worlds.

When to use this pattern

Not every multi-agent system needs custom routing. If you have fewer than ten agents with clearly distinct purposes, the SDK's built-in routing will likely serve you well. The overhead of building and maintaining a custom routing layer only pays off when you hit specific scaling or operational challenges.

Build a custom routing layer when your agent count creates discovery problems. Once users start asking "which agent do I use for X?" or support tickets mention bouncing between agents, you've outgrown simple routing. In our experience, this threshold sits somewhere between 15-25 agents, depending on how much overlap exists in their capabilities.

Invest in routing transparency when you need accountability. Regulated industries, enterprise deployments, and any system where routing errors have real consequences benefit from explainable decisions. If you can't answer "why did the system choose agent X over agent Y?" when something goes wrong, you'll struggle to debug issues and build user trust.

Prioritize conversation-aware routing when your agents handle multi-turn interactions. Stateless routing works fine for one-shot queries, but the moment users engage in back-and-forth conversations, you need routing that remembers context. Without it, users risk being bounced to a different agent mid-conversation, losing all prior context.

Consider dual-LLM scoring when routing accuracy directly impacts user experience or business metrics. The added latency and cost of a second model evaluation only makes sense when misrouting carries meaningful consequences - frustrated users, failed transactions, or wasted specialist time. For low-stakes routing, a single model suffices.

Conversely, stick with out-of-the-box solutions if your agent ecosystem is stable, your routing requirements are straightforward, or you lack the engineering capacity to maintain custom infrastructure. The best routing system is the simplest one that meets your actual needs.

Conclusion

Building on proven infrastructure doesn't mean accepting its limitations. The Platform Routing Agent demonstrates that you can layer intelligence and control on top of battle-tested foundations without reinventing the wheel.

As AI agent ecosystems continue to grow, intelligent routing becomes increasingly critical. The pattern we've developed - combining LLM-based reasoning with robust communication protocols - provides a scalable, transparent, and performant solution to the agent discovery problem.

The future of multi-agent systems isn't just about building specialized agents - it's about building the intelligence to connect users with the right agent at the right time.