reading
8 min read

The AI Stack Is Half-Built

Notes from a survey of twenty-three AI-native engineering leaders on what they spend, what they ship, and what they still build themselves.

Tobi Coker

Teams Want To Buy What's Hardest To Build

AI is the first software wave to reach mass deployment before its infrastructure was built.

In every prior cycle, the orchestration and operations layers arrived before the workloads. The cloud era had Hadoop running data pipelines and Datadog watching them well before most enterprises had completed their migration. With AI, that order is inverted. Agents are running in production at almost half the AI-native cohort. Models are being swapped, post-trained, and chained into multi-step workflows in front of paying customers. The orchestration, observability, and reliability primitives that should sit underneath all of it are still being built — in public, after the fact, by the teams running the workloads.

We wanted to know what those teams are actually living with. So we surveyed twenty-three engineering leaders anonymously across the AI-native cohort, seed through public, and asked them to describe their stack. Not what they planned to buy. What they shipped, what they spend, and what they're still wiring together themselves. The respondents run engineering at some of the most recognizable tech companies in the world, and we promised them anonymity so they could answer openly.

Teams Want To Buy What's Hardest To Build

Three findings will further refine the infrastructure layer investment thesis that led to our investment in Deep Infra (opens in new tab), among others. All three are consequences of the same pattern: the stack the data demands wasn't ready when the workloads arrived.

Inference is the new line item

Sixteen of the twenty-three teams we surveyed — 69.6% — more than doubled inference spend in the last six months. Among teams that train at all, inference runs at multiples of training spend, with 34.8% reporting it's more than 5x. Cost per token was the most-cited compute bottleneck in our survey: 43.5% named it as their single biggest blocker, more than any other answer. Async workloads are growing. Agents are extending run lengths. The line items keep moving up and to the right, and stories of teams burning through their annual AI budget in a single quarter have stopped being unusual.

Inference Spend Is Climbing, Fast

Inference being expensive isn't the surprise. The surprise is that nobody has agreed on how to watch it. The largest single segment of teams in our survey — 56.5% — monitors AI pipelines with custom internal dashboards. LangSmith is at 30.4%. Datadog adaptations sit at 26.1%. There is no observability standard at the layer where money is being burned in real time. This is what the inversion looks like in practice: the cost line is fully formed; the operations layer that should sit underneath it has not arrived.

The fastest-growing cost line in modern software is being managed with home-built spreadsheets and Grafana boards. That gap will close. The company that closes it will be operating at the budget center of every AI-native team in the market.

What's coming next for inference is bifurcation. The data already shows the seam. 52.2% of teams have 10 to 30% of their workloads running asynchronously, and the async side prioritizes throughput first, cost second. The synchronous side prioritizes latency and reaches for the fastest frontier model. As inference grows from its current 5x training-spend ratio to whatever comes next, the stack that serves async workloads cost-effectively at scale will not look like the stack that serves real-time agentic interactions. The same is true across modalities — text inference will not look like video inference, and neither will look like the world models that are still emerging. Today most teams use one inference path for all of it. That won't last.

OSS inference is the wedge underneath both halves. As one respondent put it: "A lot of companies provide inference as a service, but either reliability or performance (due to quantization) are poor." The teams in our survey ranked OSS inference dead last on the list of things they want to build internally. They want to buy it. Today, they can't really buy it well.

Agents are in production. Their infrastructure isn't

Eleven of twenty-three teams — 47.8% — run autonomous agent workflows as a core product feature. Not as a demo. Not as a roadmap item. As the thing they ship. Add the teams running early pilots and the share climbs to 65.2%. The pattern holds across stages: agent adoption is roughly as common at seed-stage companies as at Series C+ teams, which is the clearest single signal in the survey that this is a workload-driven shift, not a maturity-driven one.

The same teams are wiring the failure-recovery harness, the durable state layer, and the workflow replay logic themselves. From scratch. Listen to one of our respondents describe the gap they wish someone would close:

"Reliable failure handling and recovery for long-running agent workflows. Inference APIs are getting faster and cheaper but there's no good primitive for 'retry this 20-step agent run from step 10.' The ecosystem optimizes for individual completions, not durable multi-step execution."

Another put it more bluntly: stitching today's agent frameworks together into one reliable system "still takes too much custom glue."

Agent orchestration was the second-least-likely category to be built internally going forward, according to teams in our survey — only 26.3% of those who answered the build-vs-buy question said they'd build agent orchestration themselves. They have decided they don't want to keep doing it. They are waiting for the right primitive.

The pattern we are watching is the one that turned web infrastructure into a category twenty years ago: a primitive painful enough at small scale that the build-vs-buy decision becomes obvious by the time a team is running real workloads. The difference, this time, is that the workloads got there first.

The model layer is a commodity and a premium good at the same time

Two opposing truths are running simultaneously, and both are visible in the data.

The first truth: the model layer is a commodity. 82.6% of teams run two or more models in production at once. The single largest cohort (43.5%) runs four to seven. 95.7% use Anthropic, but 82.6% of those teams are open to switching, and 34.8% — more than a third of the cohort — switch frequently based on performance. Models are being swapped task by task. Today's preference is one benchmark away from being yesterday's.

The second truth: the model layer is a premium good. 56.5% of teams use closed frontier models exclusively for their most critical workloads. They pay the markup. They pay it because frontier quality is the only thing that gets the task to ship-ready, and model quality is one of the two most-cited bottlenecks to scaling — tied with GPU availability at the top of the list. The frontier APIs are simultaneously the cheapest commodity in the stack and the most expensive premium good in it, depending on which workload you ask about.

Two Truths About The Model Layer

The second truth is currently subsidizing the first. As open-source quality climbs, the premium half collapses. The teams that survive the collapse will be the ones whose product differentiation lives somewhere other than which API they call.

Most of those teams already know this. 47.8% post-train models on their own data — another 17.4% are running early pilots. This isn't an optimization. As one engineering leader running a public-stage company told us, post-training on your own data is what gets the model to high-enough efficacy that you trust it in front of a customer. The frontier APIs cannot do tasks end-to-end at the quality these teams need to ship. The gap is closed in post-training, not in better prompts.

The bottleneck on post-training isn't compute. It's measurement. Evals were the most-cited tooling gap in our survey: 45% of the teams who answered the question named them as the number-one unsolved problem. 57.9% would build their own eval framework rather than buy. The reason isn't preference. As one respondent put it: "Getting good evals is so hard, especially when running evals trying to use the entire system." The existing tools test individual LLM calls when what teams need is end-to-end pipeline testing across an agentic workflow.

The bet is that managed continual learning, plus the evals that prove the post-training is working, turns model differentiation from a research project into a procurement line item — and turns the premium-frontier dependency from an existential bet into an option.

Where the next companies come from

We mapped where the billion-dollar AI infrastructure companies of the last decade actually came out of. Berkeley's Sky Computing Lab is the alma mater of Databricks, Anyscale, Inferact, and RadixArk — four companies, more than any other lab on the map. Stanford Hazy Research can claim Together AI's chief scientist and an advisor on the FlashAttention algorithm, which every major model in production now depends on. The international labs are the next edge: ETH Zurich's systems group, the Hasso Plattner Institute in Potsdam, and TU Munich are producing top tier distributed-systems researchers, including the CTO who built Prime Intellect's foundation. No US fund has a formal partnership at any of them.

What we'll do next

Our goal in running this survey wasn't to predict winners. It was to understand what the engineering leaders building AI-native products are willing to pay for, what they refuse to build themselves, and what they wish existed. The three findings above are where we think the answer is clearest right now, and they trace the same shape: the cost line is real, the workloads are real, and the stack underneath them is being built in reverse — in public, by the teams running the workloads.

We will run this survey again in six months. If you are building infrastructure in any of these areas, get in touch.

Authors

  • Tobi Coker

    Partner

Tags

    AIInfra

Share

Newsletter

Get the latest news & insights

from the Felicis community.