# The LLM router was three products wearing one name

URL: https://www.thedeepfeed.ai/posts/2026-06-09-llm-router-three-products-one-name/
Category: Tools
Published: 2026-06-09
Author: the-deep-feed
Tags: llm-router, ai-gateway, openrouter, inference, infrastructure
Kind: deep

> Three different products spent two years fighting over the phrase 'LLM router.' Aggregation won at a $1.3B valuation, gateways became table stakes, and the most-hyped category — semantic cost-routing — quietly froze on GitHub.

## TL;DR

- **'LLM router' meant three different products.** Aggregation (route the same model to the cheapest provider), semantic routing (pick a different, cheaper model per prompt), and gateways (failover, observability, key management).
- **Aggregation won.** OpenRouter raised **$113M** at **~$1.3B** in May 2026, moving **25 trillion tokens a week** — and takes no token markup.
- **Semantic cost-routing stalled.** RouteLLM has had no commit since **August 2024**; Martian's homepage now sells interpretability research, not a router.
- **The squeeze is structural.** When the 'good enough' tier already costs **$0.10–0.60 per million tokens**, the dollars a per-prompt router can save shrink toward noise while its overhead stays fixed.
- **The survivor sells upstream.** Not Diamond stayed alive by powering OpenRouter's auto-router — routing became a feature inside aggregation, not a business beside it.

For about two years, three companies could each say "we build an LLM router" and all three would be telling the truth. They were also describing three unrelated products. **OpenRouter** routes a request for Claude to whichever provider serves Claude cheapest that second. Martian routed a request to a *different, cheaper model* it predicted would answer just as well. LiteLLM routes around a provider outage and logs what the request cost. Same verb, three businesses, one crowded word.

By June 2026 the ambiguity has resolved itself, and the resolution is uneven. One of those three products is a unicorn. One became a free feature bundled into every cloud platform. And the third, the one that got the conference talks, the "beats GPT-4 at a fraction of the cost" headlines, the patent filings, is a graveyard of frozen GitHub repositories. The most technically interesting version of routing turned out to be the worst business.

This is a map of who won which fight, and why the category everyone was most excited about is the one that quietly died.

![One LLM-router box splitting into three paths: aggregation thrives, semantic routing fails in red, gateway is a plain utility valve.](/post-images/2026-06-09-llm-router-three-products-one-name/three-products-one-word.jpg)

# Three products, one word

The confusion is worth taking seriously because it shaped how the market was funded. Investors heard "router" and priced a single category. There were always three.

| Category | What it routes | Value proposition | 2026 status |
|---|---|---|---|
| **Aggregation** | The same model, to the cheapest/fastest provider | One API, every model, no lock-in | 🟢 Won decisively |
| **Semantic routing** | A *different*, often cheaper model per prompt | Cut spend by downgrading easy queries | 🔴 Stalled |
| **Gateway / ops** | Around failures; logs, keys, caching, limits | Production reliability and observability | 🟢 Became table stakes |

The distinction that matters most is between the first two, because they sound identical and are economic opposites. Aggregation assumes you already know which model you want and competes on *delivery*: better price, lower latency, higher uptime for the model you chose. Semantic routing competes on *selection*. It presumes you picked too expensive a model and it can quietly swap in a cheaper one. The first sells you a logistics layer. The second sells you a bet that it knows your quality bar better than you do.

Hold that difference. It explains everything that follows.

# Aggregation won

**OpenRouter**, founded in 2023, is the cleanest win in the category. In June 2025 it raised a combined $40M seed and Series A co-led by a16z and Menlo Ventures, with Sequoia participating ([a16z](https://a16z.com/announcement/investing-in-openrouter/), [GlobeNewswire](https://www.globenewswire.com/news-release/2025/06/25/3105125/0/en/OpenRouter-raises-40-million-to-scale-up-multi-model-inference-for-enterprise.html)). Eleven months later it raised a $113M Series B led by CapitalG, Alphabet's growth fund, with NVIDIA's NVentures, ServiceNow, MongoDB, Snowflake, and Databricks all joining the round ([BusinessWire](https://www.businesswire.com/news/home/20260526953416/en/OpenRouter-Raises-%24113-Million-CapitalG-led-Series-B-as-Weekly-Volume-Explodes-to-25T-Tokens)). TechCrunch put the post-money valuation at roughly $1.3B ([TechCrunch](https://techcrunch.com/2026/05/26/openrouter-more-than-doubles-valuation-to-1-3b-in-a-year/)).

The traction number is the one to sit with. OpenRouter says it moves 25 trillion tokens a week across 400-plus models and 8 million users — a fivefold jump from 5 trillion tokens a week just six months earlier ([BusinessWire](https://www.businesswire.com/news/home/20260526953416/en/OpenRouter-Raises-%24113-Million-CapitalG-led-Series-B-as-Weekly-Volume-Explodes-to-25T-Tokens)). The company frames its own thesis as the death of single-model loyalty:

> Running inference at scale is fundamentally a multi-model problem. The era of picking a single model is over. Success now depends on continuously routing across a changing market.
>
> — Alex Atallah, CEO, [OpenRouter Series B announcement](https://www.businesswire.com/news/home/20260526953416/en/OpenRouter-Raises-%24113-Million-CapitalG-led-Series-B-as-Weekly-Volume-Explodes-to-25T-Tokens), May 26, 2026

Here is the part that should make every semantic-routing founder wince: OpenRouter takes no markup on tokens. It passes through each provider's list price and monetizes with a roughly 5.5% fee on credit purchases plus a ~5% fee on bring-your-own-key usage ([OpenRouter pricing](https://openrouter.ai/pricing), [independent breakdown](https://ofox.ai/blog/openrouter-pricing-hidden-markup-breakdown-2026/)). It does not promise to save you money on inference. It promises to never be the reason you can't switch. That is a logistics business, Stripe for tokens, and it scales on raw volume, not on the cleverness of any individual routing decision.

The builders watching this market read it the same way. The product, increasingly, is the routing layer itself, not the model underneath:

> The AI model market is turning into an inference spot market. Developers are routing to whatever model gives the best cost, latency, context, coding ability, and reliability for the job.
>
> — [@ollobrains](https://x.com/ollobrains/status/2063843002750206097), June 8, 2026

An inference spot market rewards the exchange, not the trader with the best model-picking algorithm. Aggregation built the exchange.

# The semantic-routing graveyard

Now the category that was supposed to be the clever one.

The pitch was genuinely compelling. Most production prompts are easy. A model that costs thirty times less can answer them indistinguishably. So put a small, fast classifier in front of every request, predict whether it needs the expensive model, and route accordingly. The academic version, **RouteLLM** from LMSYS and UC Berkeley, reported the dream numbers: "cost reductions of over 85% on MT Bench... while still achieving 95% of GPT-4's performance" ([LMSYS](https://lmsys.org/blog/2024-07-01-routellm/), [arXiv 2406.18665](https://arxiv.org/abs/2406.18665)). The paper went to ICLR 2025. The repository collected 4,991 stars.

It has not received a commit since August 10, 2024 ([lm-sys/RouteLLM](https://github.com/lm-sys/RouteLLM)). The canonical open-source routing framework has been frozen for roughly twenty-two months while the rest of the inference stack churns weekly.

That is not an isolated repo going stale. It is the pattern across the whole category.

| Company | Original pitch | Where it went |
|---|---|---|
| **Martian** | "We invented the first LLM router" | Interpretability research firm |
| **Unify** (YC W23) | "The best LLM on every prompt" | Agent "virtual colleague" runtime |
| **RouteLLM** (LMSYS) | 85% cost cut at 95% quality | Dormant since Aug 2024 |
| **Not Diamond** | Best model on every request | Supplier to OpenRouter's auto-router |

Martian is the sharpest pivot. The company raised $9M on the strength of a routing pitch, with Accenture Ventures investing in September 2024 specifically citing its "patent-pending LLM router" ([Accenture](https://newsroom.accenture.com/news/2024/accenture-invests-in-martian-to-bring-dynamic-routing-of-large-language-queries-and-more-effective-ai-systems-to-clients)). Visit withmartian.com today and there is no router above the fold. The headline reads "Understanding Intelligence," and the copy describes "a team of researchers who've left the big labs to focus on understanding machine intelligence" ([withmartian.com](https://withmartian.com/)). The company now runs a $1M interpretability prize ([withmartian.com/prize](https://withmartian.com/prize)). Its GitHub trail tells the same story in commit dates: `withmartian/routerbench`, the routing benchmark, last moved on June 13, 2024, while every *active* Martian repository is now about evals and model steering ([routerbench](https://github.com/withmartian/routerbench)).

Unify, a Y Combinator W23 company, launched as "dynamically route each prompt to the best LLM and provider" ([YC](https://www.ycombinator.com/launches/L4t-unify-the-best-llm-on-every-prompt)). Its live repository today is `unifyai/unity`, described as "the agent runtime for our virtual colleagues" — a different product in a different category ([GitHub](https://github.com/unifyai/unity)). The old routing SDK is dormant.

The open-source star counts make the gap impossible to round away.

| Repo | Stars | Last pushed | Category |
|---|---|---|---|
| `BerriAI/litellm` | 49,747 | 2026-06-09 | Gateway |
| `Portkey-AI/gateway` | 12,011 | 2026-05-25 | Gateway |
| `lm-sys/RouteLLM` | 4,991 | **2024-08-10** | Semantic |
| `withmartian/routerbench` | 165 | **2024-06-13** | Semantic |
| `Not-Diamond/notdiamond-python` | 91 | 2025-12-11 | Semantic |

*GitHub API, June 9, 2026.* The two canonical semantic-routing repositories are both frozen at mid-2024. The two gateways are an order of magnitude larger and were both pushed within the last fortnight. The category that got the headlines is invisible in the place builders actually vote.

![Bar chart: two tall gateway repos still alive versus three short semantic-routing repos marked in red and tagged frozen since 2024.](/post-images/2026-06-09-llm-router-three-products-one-name/github-graveyard.jpg)

# Why the clever router got squeezed

Semantic routing did not fail because the engineering was wrong. It failed because the thing it arbitraged disappeared.

A per-prompt router only earns its keep when there is a wide price-and-quality gap to exploit: an expensive model worth avoiding, and a cheap model that is nonetheless good enough. In March 2023, GPT-4 launched at $30 per million input tokens. By July 2024, GPT-4o mini delivered comparable quality at $0.15 per million, a roughly 200-fold drop in sixteen months ([TokenCost](https://tokencost.app/blog/ai-price-index)). GPT-4.1 nano followed at $0.10 per million input ([Simon Willison](https://simonwillison.net/2025/Apr/14/gpt-4-1/)). Gemini Flash and DeepSeek's cache-hit pricing sit in the same basement.

When the cheap tier already costs ten to sixty cents per million tokens, the absolute dollars a router can save on any single request collapse toward noise — while the latency it adds, the classifier it has to maintain, and the quality risk it introduces all stay fixed. The arbitrage narrows; the overhead does not. That is the squeeze, and a builder digging into a real deployment found exactly the ceiling the math predicts:

> Lots of excitement around Factory's model router and its 20% cost saving... After digging into the numbers, I think it's close to the ceiling. Here's why: routing only saves money on work a cheaper model can handle.
>
> — [@harry_uglow](https://x.com/harry_uglow/status/2062464593134280870), June 4, 2026

The academic literature arrived at the same place from the other direction. One 2025 paper, "How Robust Are Router-LLMs?", finds that reported routing gains are fragile and shrink sharply under evaluation stress ([arXiv 2504.07113](https://arxiv.org/html/2504.07113v1)). A 2026 follow-up, "When Routing Collapses," documents a degenerate failure mode where routers converge on a single model as the cost budget rises — defeating the entire premise ([arXiv 2602.03478](https://arxiv.org/html/2602.03478)). The savings were real in the benchmark and brittle in production. For a standalone company, brittle-in-production is fatal.

# The survivor sells upstream

There is one important exception, and it complicates the obituary rather than contradicting it.

**Not Diamond** raised a $2.3M seed led by defy.vc to build a router trained on evaluation and preference data ([FinSMEs](https://www.finsmes.com/2024/07/not-diamond-raises-2-3m-in-funding.html)). It is still operating in 2026 — but not by selling per-prompt routing to developers. In a June 2026 essay, CEO Tomás Hernando Kofman describes the company as "the world's largest vendor of intelligent model routing, powering auto-routing in OpenRouter" ([Forward Future](https://briefing.forwardfuture.ai/p/model-routing-will-control-the-future-of-economic-value)).

Read that carefully. The most successful semantic router survived by becoming a *component inside the aggregation layer*. The technique did not vanish; it got absorbed. Routing intelligence turned out to be a feature that the winning aggregation platform wants to offer, an "auto" setting next to the model picker, not a product a developer will integrate, pay for, and maintain on its own. The same destination shows up in how the largest consumer surface is being reframed:

> OpenAI is trying to turn ChatGPT from a chatbot into an intent router: one interface that understands what the user is trying to do, chooses the right model/tool/app/agent, and completes the workflow without making the user manually pick the product.
>
> — [@ollobrains](https://x.com/ollobrains/status/2063573753406406774), June 7, 2026

Routing as a feature of a platform people already use — not routing as a destination. That is where the selection problem lives now.

# Gateways became table stakes

The third category never had a glamorous pitch, which is exactly why it endured. A gateway sits in front of your providers and does the unglamorous work: failover when a provider 500s, key management, request logging, cost attribution, caching, rate limits. It does not claim to make you smarter. It claims to keep you up.

**LiteLLM**, from BerriAI (YC W23), is the open-source anchor, with 49,747 GitHub stars and a commit on the day this was written ([GitHub](https://github.com/BerriAI/litellm)). **Portkey** raised a $15M Series A led by Elevation Capital in February 2026, positioning itself as a "unified control plane for production AI" that tracks $93M in customer spend across 24,000 organizations ([Portkey](https://portkey.ai/blog/series-a-funding)). These are real businesses solving a real operational problem.

But the gateway is commoditizing from above, and fast. Vercel shipped an AI Gateway, generally available since August 2025, with sub-20ms routing and the real dagger: zero markup on tokens, including BYOK, with $5 of monthly free credit ([Vercel](https://vercel.com/docs/ai-gateway/pricing)). Cloudflare's AI Gateway has been generally available since May 2024 and offers its core analytics, caching, and rate-limiting features free on every plan ([Cloudflare](https://blog.cloudflare.com/ai-gateway-is-generally-available/)). When two of the largest developer platforms give away the gateway as a customer-acquisition loss leader, the standalone gateway's pricing power erodes toward the cost of the differentiated parts — governance, guardrails, enterprise controls.

The consolidation has already produced its first casualty-by-acquisition. **Helicone**, the YC observability-and-gateway startup, was acquired by documentation company Mintlify in March 2026 and put into maintenance mode — security patches only, active feature development stopped ([Mintlify](https://www.mintlify.com/blog/mintlify-acquires-helicone), [Helicone](https://www.helicone.ai/blog/joining-mintlify)). The founders' parting lesson is a fitting epitaph for the whole "router" confusion:

> The knowledge layer, not the model, is what makes or breaks AI in production.
>
> — Helicone founders, [on joining Mintlify](https://www.mintlify.com/blog/why-we-joined-mintlify), March 11, 2026

The gateway, in other words, is a feature of a larger production-AI surface, not a durable standalone category. It survives; it just gets absorbed into platforms with a wider reason to exist.

# Where the value actually settled

The two-year fight over one word resolved into a clear shape, and it is not the shape the funding implied. The selection problem, picking a cheaper model per prompt, was the most-hyped and the least durable, because the price collapse it depended on destroyed its own margin. The logistics problem, delivering the model you chose reliably and cheaply with no lock-in, was the boring one, and it built a unicorn. The reliability problem, keeping production up, was real and fundable and is now being absorbed into the platforms developers already pay for.

If there is a single lesson for the next infrastructure category that arrives wearing one word over three products, it is this: when a startup's entire value proposition is an arbitrage, check whether the gap it arbitrages is widening or closing. Semantic routing was a bet that the price-quality gap between models would stay wide enough to sell. The labs closed it. Aggregation made the opposite bet: that the *number* of models would keep growing and someone would need a single door to all of them. That bet is still paying, at twenty-five trillion tokens a week and climbing.

## Sources

- [OpenRouter Raises $113M CapitalG-led Series B (BusinessWire)](https://www.businesswire.com/news/home/20260526953416/en/OpenRouter-Raises-%24113-Million-CapitalG-led-Series-B-as-Weekly-Volume-Explodes-to-25T-Tokens)
- [OpenRouter more than doubles valuation to $1.3B (TechCrunch)](https://techcrunch.com/2026/05/26/openrouter-more-than-doubles-valuation-to-1-3b-in-a-year/)
- [a16z — Investing in OpenRouter](https://a16z.com/announcement/investing-in-openrouter/)
- [RouteLLM: Learning to Route LLMs with Preference Data (arXiv 2406.18665)](https://arxiv.org/abs/2406.18665)
- [LMSYS — RouteLLM blog](https://lmsys.org/blog/2024-07-01-routellm/)
- [lm-sys/RouteLLM (GitHub)](https://github.com/lm-sys/RouteLLM)
- [Martian — Understanding Intelligence (homepage)](https://withmartian.com/)
- [Martian — $1M Interpretability Prize](https://withmartian.com/prize)
- [withmartian/routerbench (GitHub)](https://github.com/withmartian/routerbench)
- [Unify — unifyai/unity (GitHub)](https://github.com/unifyai/unity)
- [Not Diamond — Model routing will control the future of economic value (Forward Future)](https://briefing.forwardfuture.ai/p/model-routing-will-control-the-future-of-economic-value)
- [BerriAI/litellm (GitHub)](https://github.com/BerriAI/litellm)
- [Portkey raises $15M Series A](https://portkey.ai/blog/series-a-funding)
- [Mintlify acquires Helicone](https://www.mintlify.com/blog/mintlify-acquires-helicone)
- [Helicone — joining Mintlify](https://www.helicone.ai/blog/joining-mintlify)
- [Vercel AI Gateway pricing](https://vercel.com/docs/ai-gateway/pricing)
- [Cloudflare AI Gateway is generally available](https://blog.cloudflare.com/ai-gateway-is-generally-available/)
- [How Robust Are Router-LLMs? (arXiv 2504.07113)](https://arxiv.org/html/2504.07113v1)
- [When Routing Collapses (arXiv 2602.03478)](https://arxiv.org/html/2602.03478)
- [TokenCost — AI Price Index](https://tokencost.app/blog/ai-price-index)

---

Canonical: https://www.thedeepfeed.ai/posts/2026-06-09-llm-router-three-products-one-name/
Site: https://www.thedeepfeed.ai
Full corpus: https://www.thedeepfeed.ai/llms-full.txt