Consensus of LLMs: Scaling Accuracy Beyond a Single Model

Key takeaways

CoL orchestrates multiple complete LLMs from different vendors at the application layer, unlike MoE which routes internally within a single model.
Consensus operates at three levels: micro for validating individual steps, mid for key decision points, and high for full request orchestration with confidence scores.
The approach requires multiple models to agree before showing results to customers, reducing hallucinations in high-stakes business scenarios like security guidance or account changes.

LLMs are changing how we search, code, write, and make business decisions. At GoDaddy, we've been exploring ways to make them more reliable, more explainable, and better aligned with the specific needs of small business owners.

The industry has already experimented with Mixture of Experts (MoE) architectures — where different parts of a single model specialize in different kinds of problems. MoE works inside a model, routing tokens to specialized feed-forward networks that can handle them best.

But there's another layer of opportunity: instead of relying on a single LLM's internal routing, we can orchestrate multiple entire LLMs — from different vendors, with different architectures, or even different prompting strategies — and have them work together toward a consensus answer.

We call this approach Consensus of LLMs (CoL).

Of course, simply having multiple models vote isn't enough. The magic is in how we orchestrate them: using different prompting strategies to approach problems from multiple angles, implementing smart routing that knows which models excel at which tasks, and building verification layers that can spot when consensus might be misleading.

The core idea

CoL is about asking multiple models (or multiple configurations of the same model) the same question, but in slightly different ways, and then aggregating their responses to produce a single, trustworthy output.

Unlike MoE, where routing happens deep inside a single LLM's architecture, CoL happens outside the model as part of an orchestration layer we control. This gives us several advantages:

Model/vendor independence – Swap in GPT, Claude, Gemini, open-source Llama, or fine-tuned proprietary models as needed.
Dynamic control over cost vs. accuracy – Add more "voters" only when confidence is low.
Transparent provenance – Show which models agreed and which sources were cited.

Three levels of consensus

The following sections describe the three levels of consensus used in the CoL model.

Level	Description	Example
Low-level consensus (Micro-aggregation inside a flow)	Consensus occurs at small units of reasoning by validating each step against multiple outputs before continuing.	When generating a database migration script, two different models produce the same SQL changes; a verifier checks them before moving to the execution phase.
Mid-level consensus (Decision points in multi-step chains)	Consensus occurs at key workflow points by running decision steps across several models and choosing the most consistent result.	In a customer-support chatbot, before sending a response with account-impacting instructions, we confirm at least three models agree on the same safe and accurate recommendation.
High-level consensus (Full orchestration outside the prompt)	Consensus occurs through full orchestration by receiving requests, consulting multiple models, gathering outputs, applying consensus algorithms, and returning results with confidence scores.	For market analysis, we consult four different LLMs, each prompted with a slightly different framing, and then combine results where at least 75% agreement exists, weighted by citations from reputable sources.

How we might use it at GoDaddy

The following sections describe possible use cases for the CoL model at GoDaddy.

AI-assisted domain name search - Multiple models brainstorm creative name ideas from the same set of keywords. Consensus filters out weak or repetitive suggestions, showing users only names agreed upon by at least two models and available for purchase.
Customer support bots - Consensus ensures no single model can give an unsafe or policy-violating response. Each response must pass both model agreement and a rules-based verification before going to a customer.
Website copy generation - Different models generate marketing text, and consensus selects the most on-brand, policy-compliant copy. We can blend the best headlines from one model with the best calls-to-action from another.
Security guidance - Consensus is used for advice in sensitive areas like SSL/TLS configuration or DNS changes, ensuring multiple models agree on secure practices before presenting them to users.

Advancements we see coming

The following sections describe future enhancements we're exploring for the CoL model.

Dynamic confidence routing - Instead of always calling multiple models, we can start with a fast, cost-effective model, and if its confidence is low or the task is high-stakes, automatically bring in others to cross-check.
Evidence-weighted voting - Rather than just counting votes, we'll weight them by how well each model supports its answer. We can evaluate whether there are verifiable sources, whether citations check out, and for code, whether it compiles and passes tests.
Judgment models - Use a "judge" model to read multiple outputs and decide which is most correct, potentially informed by historical performance data about which models excel at which tasks.
Continuous learning from disagreement - When models disagree, log those cases and use them to fine-tune in-house models to perform better in those scenarios and improve the consensus algorithm over time.

Why this matters

As we integrate AI deeper into GoDaddy's customer experience, trust becomes non-negotiable. CoL gives us a framework for reducing hallucinations, increasing factual accuracy, and providing transparency to customers about how answers are chosen.

We're not replacing human judgment — we're building AI systems that are more consistent, verifiable, and aligned with our business values.

Closing thoughts

CoL takes the spirit of MoE and applies it at the orchestration level, across entire models. It's a natural evolution for enterprises that want the flexibility to choose the right model for the right moment — and the safety net of multiple perspectives when accuracy matters most.

We see this as a foundation for future AI features at GoDaddy: reliable, explainable, and customer-first.