Central Casting

THE NUMBER: 73.7 — Sakana Fugu’s score on SWE-Bench Pro, the hard one. It beat Claude Opus 4.8 at 69.2 and GPT-5.5 at 58.6. The catch, and it’s the whole story: Fugu isn’t a model. It never trained a frontier system, never bought a data center, never ran a pre-training run. It’s a conductor that hires the models, splits the work, checks it, and hands back one answer better than any of them could manage alone. The best models on earth just got out-scored by the thing that bosses them around — and the thing that bosses them around doesn’t know how to write a line of frontier code.

A lab in Tokyo called Sakana shipped Fugu on Monday, and most of the coverage treated it as another model launch. It isn’t. Fugu doesn’t compete with GPT-5.5 or Claude Opus or Gemini 3.1 Pro. It manages them. You send one request. Fugu reads it, decides which model is strongest for which slice of the job, farms the slices out, verifies the returns, and stitches one answer together. The academic guts come from two ICLR papers about coordinating models into Thinker, Worker, and Verifier roles. The marketing is simpler: send us the problem, we’ll cast the right talent.

Sit with the name for a second, because Sakana picked it on purpose. Fugu is the pufferfish — the one a Japanese chef can serve you as a delicacy or kill you with, depending entirely on how it’s cut. The whole value is in the preparation, not the fish. That’s the tell. The fish is a commodity. The judgment is everything.

And the judgment is the part nobody priced.

The Switchboard Beats the Talent

For two years this industry asked one question and only one question: who has the smartest model? Every benchmark war, every launch-day leaderboard, every “SOTA on everything by a margin” tweet was a fight over the same crown. We played the game too — guilty, on the record, repeatedly. Fugu is the moment the question changes. The new one is who decides where the smarts get spent, and that’s a different company entirely.

A trader who goes by SightBringer put it cleaner than any analyst I read this week. A single model, he wrote, is like a single factory — powerful, expensive, and brittle. If it gets regulated, nerfed, priced out, or simply falls behind, the whole operation is exposed. An orchestrator is a logistics network. It doesn’t need to own a factory. It needs to know which factory to use, when, how to combine the outputs, and how to route around a failure. “The old question was: who has the smartest model. The new question becomes: who controls the allocation of intelligence.” Then the line that should keep a few CFOs at OpenAI up at night: allocation sits above production. The lab supplies capability. The orchestrator decides where capability gets used. That is how model brands disappear — the user stops caring whether Fable or Gemini or some open model answered the sub-problem, and the models become “interchangeable organs” inside a body that isn’t theirs.

Here’s the way to feel it in your gut. Fugu is DNS for intelligence. Almost nobody thinks about DNS — it’s the invisible switchboard that turns a name you type into an address a machine can find, and most people ride whatever resolver their provider hands them. The ones who know better point at Cloudflare or Google and get something faster and more complete. Fine so far, but here’s the wrinkle that makes the router a bigger deal than DNS ever was. A DNS resolver only changes how fast you reach the answer. Every resolver returns the same address. The intelligence router changes the answer itself. Pick the wrong resolver and you lose a few milliseconds. Pick the wrong model and you ship a broken database migration, a hallucinated contract clause, a diagnosis that’s confidently wrong. Same plumbing. Wildly higher stakes. The routing table isn’t a faster phone book — it’s a quality decision wearing infrastructure’s clothes.

And like DNS, the router is also where geography lives. DNS is where countries censor and geo-block; switch your resolver and you can route around a national firewall. Fugu does the same thing one floor up. It excludes the EU pending GDPR, and its whole pitch is that if a provider gets export-controlled overnight — the way Washington pulled Fable and Mythos off the global market this month on ninety minutes’ notice — the router just stops dialing that number and books the work elsewhere. Speed, quality, sovereignty. The analogy is load-bearing in all three directions.

Two Toll Roads, and Only One of Them Is Smart

We’ve been calling the model a rental for a month, until the word lost its sting. Let me put the sting back. The rental is now priced by the hour, dynamically, in a live spot market, by a very smart agent that does not care whose name is on the engine. Need to crack a hard security problem? That’s the expensive tier, and you pay up. Need to write a poem about tiki drinks? Haiku — no pun intended — does it for pennies. Need to score a stack of articles overnight? An open Chinese model sitting on your own virtual machine does it for the cost of the electricity. Same buyer, three radically different bills, sorted in real time by capability and by what’s even legal to use where you’re standing.

That dynamism is the entire point, and it’s why the value sits where it sits. Think of two toll roads stacked on top of each other. The bottom road is compute. Nadella’s Azure, Amazon’s AWS, Google Cloud, Musk’s Colossus, Oracle — they sell road-miles at a fixed rate, and they could not care less whether the car doing the driving is a beat-up Civic or a Ferrari. They bill the asphalt. It’s a wonderful business and a dumb one, in the sense that it captures volume but exercises no judgment. The top road is allocation, and it’s dynamic. It captures judgment, per query, and skims the spread between the cheap answer and the expensive one. That spread is not small. Aligned’s own data this month had the same class of coding job running at nine dollars on a premium model and a dollar-fifty on a cheaper one. Six to one. The router exists to live in that gap.

Which brings us to the most interesting tell of the week, and it didn’t come from a startup. It came from Satya Nadella. In a Sunday interview with the Wall Street Journal, the CEO of Microsoft made the sharpest public case yet that AI’s next phase should be cheaper, swappable, and “trusted,” and scolded his own most important partners for telling the public that white-collar work is finished while demanding unlimited capital to build data centers. It read like conscience. Read the balance sheet instead. Microsoft’s Azure AI business runs at something like a $37 billion annual rate, and most of it is operating other companies’ models, not Microsoft’s own. Driving the price of inference toward zero is therefore close to free for the landlord and ruinous for the tenants — OpenAI and Anthropic, whose entire valuation is the model. Microsoft is weighing whether to host DeepSeek on Copilot. It shipped seven of its own MAI models, one of which it claims beats GPT-5.5 at a tenth of the cost. Its Copilot agent already routes your task to whatever’s cheapest. Every one of those moves points the same direction. Nadella isn’t having a change of heart about safety. He’s Fugu with a balance sheet, burning his partners’ boats and calling it social permission.

The Barbell, and the Middle That Disappears

If allocation is the game, the field flattens into a barbell. There are exactly two safe places to stand. Be the model on the absolute frontier, the one the router has to call when the job is hard enough that only the best will do. Or be the cheapest model that clears the bar on the average job, the one the router reaches for ten thousand times a day because it’s good enough and it’s free. Everything in the soft middle — the competent, mid-priced model that’s nobody’s best and nobody’s cheapest — gets routed around and compressed to zero. The router never calls it, because the router can’t think of a reason to.

And the middle isn’t a place you sit. It’s a slope you slide down. Fable was the best model the public had ever touched eleven days ago. GLM 5.2, the open Chinese model, is now good enough to be the bench player parked on your own hardware. China just narrowed the frontier gap to roughly seven months, and Huawei trained a 1.6-trillion-parameter model with zero Nvidia silicon, so the floor is rising faster than the ceiling. Today’s frontier is next quarter’s commodity. The death sentence in a routed world isn’t being in the middle. It’s being static — sitting still while the barbell rotates underneath you and the premium you used to charge evaporates.

There’s a cruelty in this that’s worth naming, because it’s the exact inverse of what we wrote yesterday. In Keeper of the Culture the argument was that the illegible work, the meshing and the judgment with no scoreboard, is the safest thing you own, precisely because no market can measure it to bid it away. Flip the lens to the supply side and the rule reverses. In a routed world, illegibility is death. A model survives the middle only if it is legibly, measurably the best at something — a vertical, a price point, a geography, a regulated task. If the router can’t name a reason it would ever pick you, you don’t get picked. You don’t get bid down. You simply cease to exist in the allocation, which is worse. The thing that protects the human integrator is the thing that kills the undifferentiated model.

The Real Asset Is the Routing Table

Now the part the email didn’t have room for, and the part that actually decides who wins.

Fugu’s edge is not the orchestration trick. The papers are published; the Thinker-Worker-Verifier idea will be cloned by August. The durable asset is the routing table itself — the living, continuously-updated map of which model is best at which task, at what price, in which jurisdiction, right now. That map is a proprietary eval harness, and it’s the same asset we said Musk really bought when SpaceX paid sixty billion for Cursor: not a tool, a sensor — the best-positioned instrument for watching how the work actually gets done. We wrote on the 16th that the public benchmark died as a buying tool and private evals were the only test left that discriminates. Fugu is that sentence turned into a product. In a world where the models are free, the most valuable thing left is knowing which free model to use. The judgment moved up a floor and it’s still judgment.

Picture how this plays inside an enterprise, because this is an enterprise story — no consumer on a twenty-dollar plan is hitting these walls outside the coders and the hardcore design crowd. You hand Fugu a workflow. It runs the workflow across the whole field and comes back with a Chinese menu: here are your options, here’s the quality, here’s the price for each, which do you want. You pick. It implements, and then it stands ready to re-optimize forever. Feed it a genuinely different workflow and it notices — this one isn’t like the others, want me to optimize it too? Similar workflows get the same treatment; different ones get their own. It’s McKinsey that actually ships the recommendation, and never stops tuning it.

And here is the move that turns a clever feature into a moat. The smart version of Fugu aggregates every customer’s optimization and every outcome, updates its base recommendations in real time, and auto-routes new and improved models into the mix the day they land. That’s a two-sided data network, and it compounds. More customers means more outcome data means better routing means more customers. The routing table reigns precisely because it gets better with use. Which is also why the benchmark obsession misses the point: the clean benchmarks — code, math, retrieval — only exist for the legible tasks. The instant you’re writing a board memo the CFO won’t redline, setting a brand voice, or choosing Midjourney versus Google for an image, there is no leaderboard at all. That’s exactly where the routing value is highest, because anyone can route on SWE-Bench, but almost nobody can route well on “which model writes like us.” The fuzzier the task, the deeper the moat.

One trap to flag, because the obvious business model is also the poisonous one. You might think the labs should just hand the router free credits to test them — keep the enterprise from paying for every audition, make the service feel premium. Skip it. The cost of a one-off test is so low that Fugu can bake it into its own pricing and never show the customer a per-test bill, which keeps the recommendation honest. The moment supplier money tilts the rankings, the router stops being a router and becomes an ad network, and you’ve rebuilt the issuer-pays model that turned Moody’s and S&P into rubber stamps before 2008, the rated party paying the rater until the ratings were worth nothing. The standard to hold is Michelin, not Yelp. A Michelin star is incorruptible because the restaurant can’t buy it; the inspectors even pay for their own meals. And the labs don’t need to be bribed to show up anyway — in a world where inference is dynamically allocated, an unmeasured model is an unbooked one, so every frontier lab will volunteer its capabilities to any router that matters, terrified of being the number the switchboard stops dialing.

We’ve Seen This Movie — It’s the Grid

None of this is new if you’ve watched a commodity mature. We said a few weeks back that the model is becoming electricity — cheap, fungible, and getting cheaper. Push the metaphor one more click and you get the punchline. When generation becomes a commodity, the money and the control don’t sit with the power plants. They sit with the operator who runs the grid — the dispatcher who decides, every five minutes, which plant fires next based on real-time marginal cost and what the wires can carry. That’s economic dispatch, the quiet seat of power in electricity for decades. Fugu is auditioning to be the grid operator for intelligence. The labs are the generators, and generators are essential, capital-intensive, and, once there are enough of them on the wire, price-takers. Allocation beats operation. It almost always does, once operation gets crowded. What’s new is only the speed: this commodity is maturing in quarters, not decades, which is why the seat is being contested now, eighteen months in, instead of a generation later.

The Catch

I’d be selling you a clean story if I stopped there, and the story isn’t clean. The flywheel that makes the router powerful rewards whoever already sees the most enterprise workflows — and that is not a plucky Tokyo startup. It’s the clouds. Azure, AWS, and Google watch a staggering share of the world’s actual AI work already, which means any of them can clone the orchestration paper and feed its routing table more real workflow data on day one than Sakana will see in a year. Sakana has the head start and the better idea. The hyperscalers have the distribution. So the honest question isn’t whether the router is the future — it is. The question is whether the independent router survives, or whether the best routing table ends up bundled, for free, with the cloud you already pay every month. My money says the seat matters more than the occupant, and the fight to fill it is the real story of the back half of this year.

Either way, the instruction for a business owner is the same, and it’s blunt. Stop asking which model to standardize on. That’s last year’s question and it has a shelf life measured in weeks. Start asking whether you own the layer that decides where your intelligence — and your margin — gets spent, or whether you’ve handed that decision to a vendor whose default resolver is optimized for the vendor.

Because the model is the talent now, and the talent is cheap and getting cheaper. Every model now auditions for every job, and the only seat that matters holds the casting sheet. Own it, or let a vendor’s default decide who you hire.

Sources:

Here are the sources with their respective links added: