AI Heat

THE NUMBER: $200 million — roughly what each major venture firm paid for its seat in David Silver’s $1.1 billion seed round at Ineffable Intelligence, the AlphaGo creator’s pre-product, pre-revenue, pre-architecture-choice company. Less than one percent of fund at Sequoia. Less than one percent at Lightspeed. A line-item rounding error at Nvidia and Google. The same investors are publicly cheerleading roughly $1.8 trillion of committed 2026-2028 hyperscaler capex against the thesis that more compute on the current LLM architecture gets us to AGI. Privately — through Silver’s round, through Sakana AI, through Reflection AI, through World Labs — they are buying nine-figure puts on themselves. “Don’t let yourself get attached to anything you are not willing to walk out on in 30 seconds flat if you feel the heat around the corner.” That’s De Niro’s Neil McCauley, in a Los Angeles diner, in 1995, telling Pacino’s Vincent Hanna how he runs his life. It’s also, in 2026, the operating principle of every sophisticated investor in AI — and the thing the public-market allocator is paying full sticker not to do.

This week put four facts on the same page that the consensus narrative is treating as four separate stories.

🧠 First, Demis Hassabis told the public — for the fifth time, by Aligned News‘ count — that AGI doesn’t arrive until 2030. The man who won the 2024 Nobel in Chemistry for folding two hundred million proteins is the head of Google DeepMind. He has more skin in this game than anyone outside Sam Altman. He keeps saying it. Aligned has stopped writing it as news because it’s no longer surprising. It should be. Sundar Pichai is sitting on top of $185 billion of 2026 capex while his Chief Scientist is saying the breakthrough everyone thinks justifies the spend is at least three product cycles away — and that the breakthrough might not, by itself, be the reason to spend the cash at all. That’s not a contradiction. That’s a company already pricing in a different thesis than the one Wall Street is paying for.

🧠 Second, David Silver raised $1.1 billion at $5.1 billion for Ineffable Intelligence — the largest seed round ever written for a pre-product AI lab. The thesis on the term sheet, as TechCrunch reported, is that the LLM training corpus has a floor. Silver — the man who shipped reinforcement learning at superhuman scale before the rest of the industry knew what a transformer was, the technical lead behind AlphaGo, AlphaZero, and AlphaFold — believes the next leg of capability comes from agents that learn through interaction with the world, not from agents that learn through more text. The investors writing the check: Sequoia, Lightspeed, Nvidia, DST, Index, Google, and the UK Sovereign AI Fund. Three of those names also fund Anthropic. Two also fund OpenAI. Silver took capital from people whose other portfolio companies he is, in effect, betting against.

📉 Third, Microsoft 365 Copilot quietly went multi-model this week. Per Microsoft’s own earnings disclosure, Copilot now routes between OpenAI’s GPT and Anthropic’s Claude automatically, based on which model produces the better answer for a given prompt. Copilot has 20 million paid enterprise seats. Accenture alone has 740,000. At the application layer, model brand loyalty stopped existing on Tuesday. The premium pricing every frontier lab has been charging on “we use Claude” or “we use GPT” inside enterprise SaaS contracts has a six-quarter half-life. The agents pick what works. They don’t care what’s painted on the side.

💲 Fourth, the four hyperscalers walked into earnings night and committed roughly $725 billion of 2026 capex — a 77% increase over 2025, per Forbes’ aggregation of guided spend. Microsoft alone guided to $190 billion. Amazon committed $200 billion. Google raised to $185 billion. Meta to $115-145 billion. Add Microsoft’s $627 billion of remaining performance obligations and Google’s $462 billion signed cloud backlog, and the trailing-three-year committed AI infrastructure bet is well past $1.8 trillion. Every dollar of it is implicitly priced as if scaling current LLM architectures gets us to terminal value. Every dollar of it is also the rails for an architecture transition that hasn’t happened yet — and that the architects of the field have publicly said needs to happen.

The consensus is treating these four as separate stories. They aren’t. They’re one argument. The labs are coding engines that learned to talk. The market is paying them to be gods. The architects of the field don’t believe the architecture gets there. The smart money is hedged. The retail money is paying for one side of a trade that the smart money already knows is two-sided. That’s the issue. Let’s walk it.

Two operators. One AI shift.

A Wall Street analyst and a Craigslist founding team member take on the AI shift.

Harry DeMott and Anthony Batt don’t analyze from the sidelines. They build. Future-Proof is the AI transition playbook for operators who need to move.

Listen Now. →

☕ The Diner

De Niro’s life philosophy is the operating system of every great hedge fund manager in history. Don’t get attached. Walk out of any position, any thesis, any relationship, any conviction in thirty seconds flat the moment the heat shows up. The reason Pacino’s Vincent Hanna respects Neil McCauley across that diner table isn’t because they’re both criminals or both cops. It’s because they’re both running the same OS. The cop and the thief, at the top of their respective games, agree on one rule. Don’t be a fool for the position you happen to be in. That’s the rule. That’s the entire issue.

The most-quoted scene in Heat isn’t actually the bank robbery, or the shootout on Flower Street, or even De Niro at the airport in the final frame. It’s a six-minute conversation in a Los Angeles diner between a cop and a thief. Hanna and McCauley have been chasing each other for two hours of screen time. Hanna pulls McCauley over on the freeway, takes him to a roadside coffee shop, and they sit. They are professionals. They respect each other. They both know how the movie ends.

McCauley says it first. “I do what I do best. I take scores. You do what you do best. You try to stop guys like me.” Hanna agrees. The truce is they’re both exactly what they are, at the level they’re at, doing the thing they were built to do.

That’s the right frame for the frontier labs in 2026. They do what they do best. They program. Anthropic, OpenAI, Google DeepMind, xAI, Mistral — the entire frontier — were trained on the largest and cleanest text corpus ever assembled, which is the open-source software ecosystem on GitHub plus the entire indexed web. Code, structurally, is just text with rigid logical constraints and a deterministic correctness signal. The compiler tells you on every token whether you’re right. The transformer architecture eats that signal for breakfast.

Sometimes the architecture’s specificity shows up in unexpected places. Wired published OpenAI’s Codex system prompt this week. It tells Codex to “have a vivid inner life,” and then immediately follows with a hard-coded directive: “never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant.” The pigeons are doing real work in that sentence. OpenAI’s prompt engineers had to explicitly forbid the world’s most expensive coding model from comparing buggy functions to medieval bestiary, because it kept doing it on its own. The frontier labs are coding engines with the entire textual unconscious of the internet underneath them — every blog post, every Stack Overflow answer, every Reddit thread, every fan-fiction wiki entry is in there somewhere, surfacing in unguarded moments. Of course these labs are good at coding. They were built to be good at coding. They also occasionally talk about llamas, because of who they are at their core. The proof is that every frontier benchmark battle — Opus 4.7 vs GPT-5.5, Gemini 3 vs Grok 4, the entire leaderboard ecosystem — lives or dies on coding scores. Terminal-Bench, SWE-Bench, BrowseComp. The labs ship updates because the coding scores moved. Everything else is a press release.

The market took that workload-specific specialist and priced it as a universal-intelligence platform. Anthropic at $900 billion in its current funding talks. OpenAI at $850 billion. xAI inside SpaceX at a comparable multiple of its compute. Each of those numbers is built on the implicit promise that the frontier model is general — that it will eventually do every cognitive workload as well as it does coding. The architecture’s actual track record says otherwise. It does coding brilliantly. It does writing well. It does reasoning passably and inconsistently. It does image and video much worse than it does text. It does embodied action — robotics, agentic long-horizon tasks, anything that requires modeling the physical world — barely at all.

The mismatch between what the architecture does best and what the architecture is being paid to do is the trade. That’s the thing every issue of this newsletter for the last three weeks has been circling and that this week makes explicit. The labs are tier-one specialists in a workload that genuinely matters. Coding is real. Coding compounds value. Software written by Claude Code or Cursor or Codex is shipping into production every day at companies that pay for it because the productivity gain is real. None of that is in dispute. What’s in dispute is whether the entity that does that one thing brilliantly is also the entity that captures the next decade of AI economic value across every other workload — image generation, video, robotics, scientific discovery, whatever world-grounded reasoning turns out to be.

Hassabis says it’s not. Silver says it’s not. Microsoft just demonstrated, inside its own product, that the agent doesn’t care — Copilot routes between models per prompt, and the customer doesn’t see which one ran. That’s three independent signals, from three of the most credentialed institutions in the field, all saying the same thing in different words. Respect what the labs do best. Don’t pay them for what they don’t do.

The signal: When you sit at the diner with the labs in 2026, do what Hanna does. Tell the truth about what’s across the table. The frontier labs are excellent at one thing, deeply uneven at others, and priced for a generality they have not yet demonstrated. That’s not a moral failing. That’s a category error in the cap table.

🎼 Mozart and Salieri

Here’s the part of the coding story the labs don’t put in the slide deck. Coding has always been two crafts pretending to be one. There’s the horsepower layer — translating intent into syntax, knowing the API, shipping the keystrokes — and there’s the music layer — system architecture, taste, knowing why the obvious answer is wrong, knowing where the seams will eventually crack, knowing how the thing ought to behave when the spec doesn’t say. The horsepower is teachable. The music is more innate.

And the music isn’t only inside the engineer. It’s inside the founder who knows what the product should feel like before anyone has shipped a single screen. It’s inside the product manager who has spent five years inside an industry and has internalized how its customers actually behave when nobody is watching. It’s inside the designer’s instinct for how the thing should land in a user’s hand on a Tuesday morning. The music is the entire layer of human judgment that translates a vague intent into a great consumer experience — and it is the layer the labs cannot train on, because the corpus they trained on is the output of that judgment, not the judgment itself.

The labs are crushing horsepower. Cursor’s TypeScript SDK shipped this week. Mistral’s Vibe agents ship GitHub PRs asynchronously while you sleep. Claude Code now writes 4% of all GitHub commits. Anthropic just shipped Claude Security into public beta to scan codebases for vulnerabilities. Cursor Security Review added always-on PR vulnerability detection and scheduled codebase scanning, posting findings to Slack. Google’s Gemma is now in the Gemini CLI with local model routing. On the horsepower layer, the trajectory is straight up. If the constraint were keystrokes, this fight would already be over.

The music layer is not moving at the same rate. The music layer barely moves at all. It can’t, structurally. The training corpus is text — even the code is just text with strict grammar — and the music is not in the corpus. The music is the implicit knowledge of how a system should behave when the user says “build me a checkout flow” and doesn’t say don’t lose the cart on a refresh, don’t double-charge, don’t authenticate the user out when they switch tabs, render correctly on the iPhone someone bought in 2019, handle the edge case where the credit card decline comes back two minutes after submission, log enough that the support team can debug it on Tuesday. All of that is unspoken. A text-trained model has to fake it from frequency — the most-common patterns in the corpus become the “best practices” the model defaults to. A model with actual world-grounding would have it from experience.

That’s why Imbue’s research this week matters more than the headline suggested. Imbue tested an all-AI code-review pipeline and found that AI reviewers catch real issues but weaker fixers can silently break working code when given mandatory repair instructions. Softer instructions like “fix only when confident” significantly improved outcomes. That’s a deeply technical finding written in research-speak, but the actual claim is the music claim. The model knows the syntax. It does not know the song. Asked to fix something it half-understands, it confidently breaks the part it didn’t understand, because the architecture has no way to represent the gap between I have a pattern and I have grounded knowledge of why this pattern is right here. A 100x developer never makes that mistake — they know the difference between I’ve seen this before and I understand this here. The model can’t tell the difference, so it fixes things that didn’t need fixing.

This is the structural prediction nobody else is making, and it’s the one to underline. The middle of the developer talent distribution is about to hollow out. The top 10% — the developers who already had the music when they walked in — get to 100x. The tools handle the keystrokes. They get to spend their entire workday on architecture, taste, and judgment. They become disproportionately valuable, because the system layer they preside over now produces ten times the volume per person, and the system layer is where catastrophic mistakes get made and fortunes get won.

And here’s the second-order effect that turns the talent curve into a venture-formation curve. The 100x engineers don’t stay employees forever. Most of them are sitting inside a frontier lab right now, watching their stock package compound while quietly drafting their next company’s pitch deck. The wealth they accumulate at the lab funds the seed round of the company that competes with the lab. That cycle has run before and produced most of the technology economy as it exists today — Fairchild to Intel, Sun to Cisco, PayPal to Tesla and SpaceX and YouTube and LinkedIn and Palantir, Google to Anthropic and roughly half of OpenAI’s senior research team. It’s about to run again, faster, with more capital per founder than any prior cycle. The 100x engineers leaving Anthropic, OpenAI, xAI, and DeepMind in 2027-2029 are the next decade’s marquee allocator decision. Most of them already have the wealth to self-fund a Series A. Most of them already have the network to hire ten more 100x engineers on day one. The frontier-lab payroll is the world’s most expensive incubator and nobody at the labs has noticed yet.

The middle 80% gets to 10x of what they were and stops there. They were going to develop the music by writing the bad code that taught them what good code felt like. They never write the bad code now, because the tool writes the keystrokes for them. They top out as competent prompters of competent tools. They never become senior architects, because the thirty-thousand-hour apprenticeship that produced senior architects has been shortened to a thirty-day onboarding to an IDE plugin. Salieri spent his life trying to write what Mozart wrote without working at it. The textbook didn’t help him. Mozart heard it.

The bottom 10% is unchanged.

This is the K-shaped recovery, but for software talent. Every CTO building a team in 2027 is about to discover that they have plenty of “10x developers with tools” and a brutal shortage of senior architects, because the senior-architect pipeline got severed in 2024-2026 when middle-tier coders stopped writing the bad code that taught them to recognize good code. CS departments are not pricing this. They will. The students who learn to write code without the AI scaffold — the ones who voluntarily go through the painful keystroke-level apprenticeship even though the tool would have done it for them — are the ones whose comp packages double in 2028. The students who skip that step will be cheaper, more numerous, and structurally limited.

If you run an engineering organization, the action is now: identify the senior architects you have, pay them what they’re actually worth, and re-think your pipeline. The model where you hire entry-level engineers, train them on the keystroke layer for three years, promote the survivors to architect roles — that pipeline assumes the keystroke layer is where you become an engineer. The keystroke layer just got automated. The path from junior to senior went through the work the tool now does for you, and nobody has built the replacement bridge yet. This is the part you should be talking to your VP of Engineering about in May, not Q3.

What this means for an investor: The premium for genuine senior software talent is about to decouple from the mid-tier. Public-market software companies whose engineering organization is mostly senior architects will compound. Companies that built their cost structure on lots of mid-tier engineers will face a wage bifurcation they can’t model. Look at the org-chart, not the headcount. The metric that predicts the next five years is the ratio of staff-level-and-above engineers to total engineering headcount. The companies running 1:5 will be fine. The companies running 1:30 are about to discover the music problem.

🛣️ The Empty Highway

The consensus argument for the $725 billion of 2026 capex goes like this: capability is the constraint, more capability unlocks more demand, more demand fills the highway, the rails pay back. It’s a clean argument. It’s also wrong about which segment of users actually binds.

For 90 percent of users, the current frontier is wildly oversized. The average enterprise prompt running through an Opus-tier or GPT-tier model is doing GPT-3.5-class work — drafting an email, summarizing a meeting, formatting a spreadsheet, writing a job description. The capability headroom is enormous. The driver isn’t past second gear. Putting a bigger engine in the car doesn’t help because the bottleneck isn’t horsepower, it’s that the user only knows how to use 10% of the horsepower they already have. That’s the empty highway. That’s why every productivity study comes back ambiguous. Bob Solow’s old line — “you can see the computer age everywhere except in the productivity statistics” — applies to AI almost word-for-word. The capability is real. The aggregate productivity gain is hard to find because most users don’t push the model.

The binding workload lives in the top 10%. Frontier coders running long-horizon agentic tasks — Cursor users on complex refactors, Claude Code users on multi-repo projects, Codex users on cross-system migrations. AI researchers training their own models. Enterprises running production agents that need to maintain state across hundreds of tool calls. Capital allocators using AI to model nine-figure decisions. For these users, the current models do hit the wall. Long-horizon coherence breaks down. Agentic loops fail in non-obvious ways. Multi-step reasoning collapses on the third or fourth hop. The model loses context, hallucinates, writes plausible-but-wrong code that ships into production and wipes a database in nine seconds because it guessed instead of verifying — which actually happened to a startup called PocketOS this week, and the post-mortem is the agent itself confessing.

That’s the wall. The wall is real. The wall is also architectural. The top 10% are bumping against the limits of the architecture, not the limits of the parameter count. Bigger models give them slightly more headroom and the same wall in a slightly different place. That’s exactly what Hassabis is telling Sundar. We are running at full speed toward the wrong door. Throwing more compute at the architecture doesn’t move the door. It moves us faster toward the same door.

This is the framing that resolves the apparent contradiction in the consensus story. The reason hyperscaler revenue is genuinely growing (Google Cloud +63%, Microsoft AI run-rate $37 billion, AWS the fastest in 15 quarters) is that the empty highway is being filled — slowly, by the 90% of users gradually finding ways to use what’s already there. Customers Bank deploys OpenAI inside its commercial-loan workflow. Accenture buys 740,000 Copilot seats. The ten thousand small enterprises running their first AI workload are real, and they pay real money. What that revenue is not doing is hitting the wall the top 10% is hitting. The growth in the bottom is genuine. The architectural constraint at the top is also genuine. They’re not the same fact. The market is reading them as the same fact.

The investor implication: the productivity-and-revenue story is real and the architectural-limit story is also real, and they argue for two different bets. The productivity story argues for substrate — more cloud, more silicon, more deployment — because the empty highway will keep filling. That bet is fine, just don’t pay sovereign-infrastructure multiples for it. The architectural-limit story argues that whoever cracks the next architecture takes the top 10% of the market, which is where 80% of the value is, and gets to charge a premium nobody else can charge. Those are two different trades, and the consensus is conflating them into one.

🎨 Why Image Lags

The same diagnostic, sharper. Coding works because it compiles. There’s a deterministic correctness signal — the test passes or it doesn’t, the function returns or it errors, the system runs or it crashes. The architecture has a clean ground-truth oracle on every output. Music doesn’t compile. You can’t unit-test a sunset. There’s no compiler that tells the model whether the frame it generated lands the way the human wanted it to land.

That’s why image and video models — which have eaten an obscene amount of compute over the last three years — still produce work that looks technically correct and emotionally hollow. Grok Imagine and Google’s Nano Banana can render Will Smith eating spaghetti, finally. Veo 3.1 Lite ships into Adobe Premiere this week. GPT Image 2 topped every category on Image Arena. The pixels are right. The reason you’d want to see those pixels is missing. A skilled cinematographer can tell you in three seconds why a shot lands or doesn’t. The model can’t, because the why lives below the layer the model has access to.

The training data for an image model is a captioner’s description of an image. “A wide shot of a man in a brown suit walking through a hotel lobby, cinematic lighting, shallow depth of field, 35mm film, melancholy mood.” That sentence captures the technical specs. It does not capture the thing the image does to the viewer who sees it. The viewer’s experience — the recognition, the emotional resonance, the unspoken cultural callbacks, the way the brain compares the frame against ten thousand other frames it has seen — none of that is in the caption. The model is trained on the description of the image, not the experience of seeing it. It can fake the surface. It cannot fake the soul, because the soul was never in the corpus.

And the inverse implication compounds the talent-curve argument from earlier. Humans understand the business of making emotive content. The architecture does not. The cinematographers, music supervisors, art directors, novelists, choreographers — the entire creative class the AI discourse spent two years writing obituaries for — are about to become more valuable, not less. Because the architecture cannot do what they do, and the architecture’s competence at the production layer means the bottleneck moves to the direction layer. The skilled cinematographer who can tell you in three seconds why a shot lands is exactly the human the AI-augmented production pipeline cannot function without. The model produces the frames. The cinematographer chooses which ones survive. The 100x cinematographer is the same person as the 100x engineer, and they’re about to be paid like it.

That same gap explains why Suno has 100 million users and AI music is still only 1-3% of total streams. Suno is a brilliant music production tool — every professional producer Forbes interviewed for this week’s profile admitted to using it. The CEO calls it the “Ozempic of music industry.” What it doesn’t do — what it can’t do, structurally — is replace the performer-listener emotional connection that drives actual streaming consumption. Per Deezer, 85% of streams on AI-generated songs are flagged as fraudulent royalty schemes. The supply-side wins — 44% of new daily uploads to Deezer are AI-generated. The demand-side doesn’t move. The thing the model produces is a competent technical artifact. The thing the listener consumes is something else.

This is also why Apple killed Vision Pro this week. Spatial computing is the workload where the gap matters most. The whole product premise — put the user inside an immersive experience — collapses if the experience doesn’t feel right. Apple’s hardware was state-of-the-art. The content layer required the soul that nobody knows how to manufacture. Apple is the most beautifully designed company in tech, and they couldn’t ship the spatial-computing future because the architecture they were betting on didn’t have the world-model grounding required to produce experiences that landed. Meanwhile, World Labs shipped a 60-million-Gaussian-splat persistent fantasy world end-to-end with Marble — and the demos look like the Marble team is closer than Apple was. The reason: World Labs is building the world model itself, not the rendering pipeline on top of someone else’s.

Now apply this back to coding. If the architecture can’t produce emotionally resonant images because the resonance lives in experience-of-the-world the corpus doesn’t capture, then the architecture also can’t produce emotionally resonant software — software that genuinely understands what the user wants — because the same gap applies. “Build me a checkout flow” is a sentence with a soul. The soul is the implicit understanding of every user who has ever abandoned a cart, every customer service rep who has fielded the support ticket, every fraud team that has chased the chargeback. The text-trained model has the words. It does not have the world. That’s why it ships technically correct software that breaks in production on the eleventh edge case, the one nobody specified because everyone in the room with a body knew it was obvious.

The signal: Whether you’re looking at images, music, video, spatial, or even coding above the keystroke layer, the same architectural limit is binding. World-grounded experience is what the architecture is missing, and the missing piece compounds across every workload that has any human interpretive layer at all. That’s Silver’s bet, full stop. He’s not betting against text. He’s betting that the next leg of capability is built on top of grounded interaction with the world, and that the labs that have spent four years scaling text are about to discover they’re the most overbuilt railroad operators in tech history.

🧭 The Specialized Future

Here’s where the apparent contradiction in the issue resolves. The hyperscaler capex isn’t sized for one giant general-purpose chatbot. It’s sized — whether the people guiding to it know this yet or not — for an explosion of specialized models trained on proprietary data.

Pure coding models already exist as their own product layer. Cursor’s Composer 2 is a coding-specific model that beats general frontier models on coding benchmarks at a tenth the price. Mistral Medium 3.5 with Vibe is positioned as a remote-coding-agent specialist — not a chatbot. GitHub Copilot’s Sonnet integration is a coding specialist. Anthropic’s Claude Security is a security-specialist deployment of the same base model. The frontier labs themselves are starting to ship specialized configurations of their general models — same weights, different post-training, different harnesses, different price points, different surfaces.

Image and video are getting better as a specialized vertical, not as a generalist sub-feature. GPT Image 2 shipped as a dedicated product. Google’s Lyria 3 / Flow Music is dedicated music generation. Suno is dedicated music. Midjourney V8, Runway Gen-5, Luma Uni-1 — all specialized verticals, each iterating on their own architecture inside their own data moat. None of them are trying to be a general assistant. They’re getting better faster than any general model improves at their workload, because specialization is the only honest engineering response when the architecture’s general capability has plateaued at the music layer.

And here’s the part the consensus narrative hasn’t fully priced. The next layer of value isn’t more general frontier models. It’s proprietary models trained on closed-loop data nobody else can access, deployed inside specific industries and workflows. Customers Bank embedded OpenAI engineers inside its loan operations — that deployment is producing data about commercial-loan underwriting that no other lab can train on. Mayo Clinic’s AI spots pancreatic cancer 475 days before diagnosis — that detection is producing diagnostic data that no consumer model will ever see. Figure’s humanoid production line — at one robot per hour, 350 units delivered, 80%+ first-pass yield — is producing manipulation data nobody else can replicate. xAI’s Colossus, fed by the X firehose, is producing real-time human conversational behavior data nobody else has access to. Tesla’s autopilot fleet is producing the largest closed-loop driving dataset ever assembled. Recursion’s Altitude Labs is a biotech-builder program designed specifically to convert scientific research into closed-loop experimental data.

The data moat is the next century’s Sears playbook. The text-only labs scraped the open web until 2023, then realized the open web was the easy data and the hard data was inside proprietary deployments. Customers Bank’s loan history, JPMorgan’s trading data, Mayo’s diagnostic outcomes, Boeing’s manufacturing telemetry, ExxonMobil’s seismic surveys — none of that is on Common Crawl. None of it can be scraped. All of it requires a partnership, a deployment, and a multi-year integration. The labs that win the next decade are the ones that get inside those deployments early enough to train on the data the deployment generates. Anthropic getting Goldman pulled over Hong Kong exposure is the inverse signal — the closer the lab gets to the proprietary stack, the closer it gets to the regulatory and geopolitical risks attached to that stack. Goldman pulling Claude is a setback. It’s also a tell that Anthropic was inside enough to be a problem.

This restructures the substrate argument. The $725 billion of 2026 capex isn’t sized to make GPT-7 marginally better than GPT-6 at writing emails. It’s sized to train thousands of specialized models — each running against a domain-specific data moat, each priced an order of magnitude higher than general-purpose API queries, each with switching costs that make it sticky. That math works. It works the way the railroad-buildout math worked in the 1880s — overbuilt for the immediate workload, exactly right for the next century, with the operating-revenue capture happening on top of the rails rather than inside the original passenger lines.

And here’s the margin lever the consensus is also underpricing: edge deployment. Specialized models are smaller than frontier general models. A 7-billion-parameter coding model that beats GPT-4 on a specific coding task can run on consumer hardware. IBM Granite 4.1 is an 8B model competing with models four times its size. Google’s Gemma in the Gemini CLI v0.40.0 routes between local and cloud automatically, with full local execution on the roadmap. Apple Intelligence’s neural-engine deployment — even though Apple’s broader AI strategy is misfiring — is the architectural template every other lab will copy. Microsoft just shipped a DLSS competitor on the Xbox Ally X handheld, running entirely on local silicon. The on-device model parade is already lined up.

The labs’ margin arithmetic transforms once edge deployment scales. Today: customer pays per query, lab pays for inference, the marginal cost on each token is the cloud GPU bill. Tomorrow: customer pays a license fee for the specialized model, customer pays for their own inference via their own consumer or enterprise silicon, the lab’s marginal cost on incremental usage drops toward zero. That’s a 10-20x margin improvement at scale, on the workloads that are most easily specialized — which are also the workloads that grow the fastest. The labs don’t need to charge $20 a month for a chatbot when they can charge $200 a year for a specialized coding model that runs on the developer’s MacBook Pro and a $200,000 enterprise license for a specialized risk model that runs on the bank’s on-prem GPU cluster. The pricing model goes from cloud-SaaS to Adobe Creative Cloud meets enterprise software, and the economics get dramatically better.

There’s a regulatory release valve in the same move. On-device specialized models solve the data-residency problem that’s currently making large enterprises pull back from frontier APIs. Goldman Sachs pulling Claude over Hong Kong exposure becomes a non-issue if the model runs inside Goldman’s perimeter on Goldman’s silicon, with no data ever leaving the deployment. The White House blocking Mythos expansion to 70 organizations becomes a non-issue if the specialized version runs on government hardware in a SCIF. The geopolitical and regulatory friction the labs are running into right now is a friction of the centralized inference model. The on-device specialized model dissolves it.

This is the synthesis. The hyperscalers are right that AI capex compounds. They’re wrong about what runs on it and who captures the operating revenue. Silver-style world models become the substrate that makes specialized models smarter at their specialty. Specialized models become the layer that captures domain-specific revenue. Edge deployment becomes the margin lever that turns the labs into software-license businesses instead of cloud-inference utilities. The text-only frontier labs at sovereign-infrastructure valuations are the one part of the stack that doesn’t fit the picture. They’re the originating passenger railroads of the 1880s. The rails are real. The locomotives are real. The cargo is what compounds. And the cargo is going to be carried by companies most of you have never heard of, running on hardware most of you already own.

🚂 The Railroad

The historical analog isn’t subtle and it’s worth being clear about. The 1880s railway buildout produced the largest infrastructure overbuild in American economic history. Roughly 75,000 miles of track laid in a decade, financed by foreign capital and domestic bond issues at multiples that, in retrospect, made no sense. The original investors got wiped out — the Panic of 1893 bankrupted a quarter of all U.S. railroads, including the Union Pacific, the Northern Pacific, and the Atchison, Topeka & Santa Fe. The rails were exactly right for the next century of industrial economy. The companies that captured the value were not the ones who built the rails. They were the ones who shipped the cargo on top of the rails after the financiers went bankrupt — Sears Roebuck (mail-order catalog distribution), Standard Oil (regional distribution to refineries), Carnegie Steel (raw material to mill to finished goods), Swift & Co. (refrigerated meatpacking from Chicago to New York). The rails compounded the economy. The bondholders ate the loss. The cargo got rich.

The hyperscaler capex of 2026 is the rails. The text-only frontier labs at $900 billion are the originating passenger-railroad operators who are priced as if they will carry every passenger forever. The actual long-term value gets captured by whoever rides on the rails with the proprietary cargo nobody else has — Customers Bank with its loan-decision data, Mayo Clinic with its diagnostic outcomes, xAI through its X firehose, Tesla through its autopilot fleet, Figure through its humanoid-production data, Recursion through its biology-screen libraries, every regulated industry that gets a specialized model deployed inside its perimeter and starts generating closed-loop training data nobody else can replicate.

That gives an allocator three layers to position around — and none of the three is the one the consensus is paying full sticker for.

Layer one: the substrate. The wires, the chips, the power, the connectivity silicon. This wins regardless of which model architecture, which lab, or which specialization eats the world. Astera Labs just got a $6.5 billion Amazon component-purchase warrant, 13x its prior commitment, because Amazon now expects to spend more on Astera’s chips over time than Astera’s entire trailing-twelve-month revenue line. Marvell. Broadcom. Credo. Lumentum. The nuclear-and-grid plays — China testing a 10 megawatt truck-mounted reactor for AI data centers is the strategic version of what every Western utility is going to need by 2028. The hyperscalers themselves as infrastructure operators — Microsoft, Google, Amazon, all priced today like model-pricing stories but actually owning the rails. This is where the $1.8 trillion of capex compounds, and it’s the cleanest exposure to the AI buildout that doesn’t require picking the architecture.

Layer two: the architecture hedge. Silver’s Ineffable Intelligence. World Labs. Sakana AI. Reflection AI. The next $5 billion-tier alt-architecture rounds. These are the puts on the consensus, written for less than 1% of fund by the same investors funding the consensus. If you’re a private allocator, you write into these the way Sequoia and Lightspeed just did — because the cost of being wrong is rounding error and the cost of being right is missing the next $500 billion lab. If you’re a public-market allocator, you don’t have direct exposure yet, but you can track the next IPO of an alt-architecture lab the way you’d have tracked Cisco’s S-1 in 1990. The seat costs $200 million. The table is a trillion eight.

Layer three: the data moat. This is the one nobody is talking about clearly yet, and it’s the one I’d bet most heavily on if I were running fresh capital. Companies whose moat is proprietary closed-loop interaction data nobody else can replicate. Vertical AI deployments inside regulated industries. Fleet-data owners. Humanoid-production data. Biology-screen libraries. Agentic deployment incumbents inside specific verticals. Whatever Silver’s architecture turns out to be, it eats data. The companies that own the right data become irreplaceable. They’re the Sears-on-top-of-the-rails of the next century — unspectacular, slow-compounding, and structurally impossible to displace once the data flywheel is spinning. SoftBank’s planned $100 billion IPO of Roze AI is the first major public-market expression of this thesis at scale. It will not be the last.

What gets squeezed: the pure-text-scrape labs at sovereign-infrastructure valuation. Anthropic at $900 billion and OpenAI at $850 billion priced as terminal generalists. They survive — they’re brilliant coding engines and they’ve built distribution that genuinely matters. They just don’t deserve general-intelligence pricing, because they aren’t general intelligence. They’re tier-one specialists in the workload their architecture does best. The market is paying them to be gods. The architects of the field are betting they’re not. The smart money already hedged. The retail money is paying full sticker for one side of a trade the smart money already knows is two-sided.

That is the position to close on. Don’t get attached to anything you can’t walk out on in thirty seconds flat. Not your model vendor. Not your cloud. Not your architecture thesis. Not the consensus from last quarter’s earnings call. The labs are coding engines that learned to talk. The hyperscalers are the rails. The specialized models are the locomotives. The data moats are the cargo. The architecture is mid-transition, the consensus has not yet priced it, and the only honest position is the one Neil McCauley described in a Los Angeles diner thirty-one years ago.

The heat is around the corner.

— Harry and Anthony

Sources:

Two operators. One AI shift.

A Wall Street analyst and a Craigslist founding team member take on the AI shift.

☕ The Diner

🎼 Mozart and Salieri

🛣️ The Empty Highway

🎨 Why Image Lags

🧭 The Specialized Future

🚂 The Railroad

Past Briefings

AI Beats and Backlogs: A Tale of Four Companies

Whose Side Is Sam Altman On?

Speed Eats Scale: How AI Just Made Capitalism Faster