July 7, 2025

Routing the Future

Model Routing Will Decide Who Wins in AI Infrastructure

Astasia Myers

Dan Bartus

Example H2

Example H3

People love to compare the AI era to the cloud era.

It makes sense; the analogy mainly lies in the suggestion that value primarily accrues at two main ends of the stack: foundational infrastructure and applications.

In the cloud era, the infra layer primarily included a few public cloud players: AWS, Azure, and GCP, which are on track to generate over $250B in annual revenue, suggesting a collective value of $2.5 trillion. As for applications, they exploded, producing well over 100 $1B companies and propelling many past $100B valuations.

You don’t have to stretch to see how the AI era might mirror the cloud era, with a few foundation model companies capturing outsized value (e.g., OpenAI, Anthropic) and many breakout app leaders (e.g., Sierra, Cursor, Abridge, Runway). This line of thinking appears sound, and valuations suggest it’s happening. But this also causes many to overlook the incredibly valuable middle layer.

‍

Why is the Middle Layer Valuable?

AI’s middle layer consists of four core areas that sit between the raw infrastructure and apps:

Inference: Where models (or several) generate outputs.
Observability: Understanding the behavior of AI models.
Testing & evals: Assessing the quality of AI models.
Orchestration & deployment. Coordinating and putting AI models, agents, or other intelligence systems into production.

Now, the middle layer was pretty valuable in the cloud era. Areas like observability (Datadog $40B market cap), security (Wiz $32B acq.), middleware (Mulesoft $7B acq.), and multi-cloud infra (Hashicorp $13B IPO) produced big winners - to name a few. Continuing the cloud-to-AI era analogy, we again expect big winners in each of these categories for AI. But this time around, the “middle layer” has even more potential due to a few key differences:

Open-Source - The open-source impact is very different for AI. In the cloud era, no open-source alternatives were matching AWS at a fraction of the price. Just look at the model layer, which is starting to fragment because OS models like DeepSeek and Llama are delivering performance that rivals closed models from OpenAI and Anthropic.
Multi-Model - Heightened competition across both closed and open-source models is producing higher-performing models at a breakneck pace. In the cloud era, public clouds weren’t leapfrogging each other on performance monthly.
Multi-Modal - Adding to the multi-model complexity, we also have AI serving different modalities, such as voice, video, and more. In the cloud era, we saw some workloads run on specialized clouds, which pales compared to AI’s multi-modality paradigm.
Portability - Shifting an application from cloud to cloud proved quite difficult. Teams weren’t moving workloads on a monthly or quarterly basis. Shifting a model from cloud to cloud is much easier.

The AI middle layer will play out very differently than the cloud era. We ultimately didn’t end up in a truly multi-cloud world, despite the promise of technologies like Kubernetes. But we’re already living in a multi-model world. A world with many providers competing to be the best inference hosting option (Deep Infra, Fireworks, Baseten, etc). While hosting is just one aspect of inference, the most exciting prospects could lie in model routing.

‍

Model Routing - AI’s Nexus Point

Model routing sits sandwiched between the applications and the models, and is the key enabler of the multi-model world. Companies like OpenRouter, Nexos.ai, Not Diamond, Martian, LiteLLM, and Unify each take different approaches to providing value. The benefits to users include:

Cost Optimization - Martian, for example, helps route to lower-cost models while maintaining performance. Model Routers also remove the headache of managing multiple billing relationships across models and clouds.
Model Testing - OpenRouter is built for developers to get started quickly and access one of the widest model libraries to easily swap and test models through a single API.
Performance Improvements - Nimble LLM gateways like LiteLLM help customers improve and ensure performance with fallbacks if a model API fails.
Prompt Management - Many are now offering prompt management. Not Diamond, for instance, takes this a step further with automatic prompt adaptation to route to the right model with the right prompt. There are also prompt management solutions like Braintrust that use evals to aid routing.

Today, most model routes are hardcoded to the latest and greatest models. That could soon evolve to dynamic model routing based on modality and performance. The end state could be a more intelligent layer that dynamically changes routes, adapts prompts, and optimizes price and/or performance on a per-query basis. As evals from sources like LMArena get more advanced, that could be another variable that informs model routing. But it remains early days. Ultimately, developers today are looking for a unified API to access the full breadth of closed and open-source models, test new models quickly, ensure performance, and free themselves from managing multiple inference providers.

‍

A $10B+ Platform Opportunity

This is where the $10B+ platform opportunity begins. We see a world where a leading model routing layer could capture more and more developer attention. Routers may become the “app store” of AI models. Think Hugging Face meets AWS Marketplace. That has major implications for the infrastructure underneath this middle layer. Inference providers and public clouds could be abstracted away and commoditized, competing on price, and eventually fading from the picture.

Owning developer attention and becoming the go-to testing playground is a strong wedge into adjacent areas: prompt management and optimization, observability, compliance, security, cost tracking, fine-tuning, evaluation, and feedback loops, and more. Developer love is paramount. Did developers care whether Twilio used AT&T or Verizon for communication services? (hint: no)

‍

The Risk: Inference Providers Strike Back

Now, the risk to the middle is that foundational players - inference and public clouds alike - move up the stack. Or, the frontier labs build an AGI so powerful that they sell it to enterprises with a full infrastructure stack behind it. In the cloud era, AWS, Azure, and GCP didn’t allow their platforms to become commoditized compute and storage. Instead, they built endless products and developer tooling to lock customers into their ecosystems. The cloud history suggests that the stickiness in AI may very well come from the prompting, fine-tuning, evaluation, and monitoring layer. With the writing on the wall, it’s no surprise to see a cloud provider like CoreWeave acquire an evaluation platform like Weights & Biases (a Felicis investment). We believe that more M&A to try to own this layer will likely follow.

‍

The Race for the Middle

The middle layer is a valuable position that is up for grabs. It may be a model routing platform, an evaluation platform, or a leading observability player that wins it. But with model routing being inherently vendor- and inference-cloud-agnostic, it’s well-positioned to deliver developer value today and capture these adjacent areas in the future.

‍

This content is provided solely for informational purposes and should not be relied upon as investment, business or tax advice. Under no circumstances should this content be relied upon when making any investment decision. Any referenced Felicis portfolio companies are not representative of all prior investments and there can be no assurance that future investments will have similar characteristics or results. Refer to our Terms of Service for additional important information.