back

I Am Iron Man

THE NUMBER: 200 milliseconds — the latency budget Mira Murati’s Thinking Machines Lab chose for its first product, Interaction Models, released to research preview this morning. Two hundred milliseconds is the unit her system uses to ingest voice, video, and text in streaming chunks. It is also, not coincidentally, the threshold below which human conversation stops feeling like turn-taking and starts feeling like presence. Sub-200ms is when latency disappears. You stop waiting for the machine. You start talking to it. The same number, named at the intelligence layer of the stack rather than the interface layer, is what Tomasz Tunguz of Theory Ventures named on Sunday night in Localmaxxing, his report on five weeks of running half of his daily work on a local 35B-parameter model on a MacBook Pro M5. Tunguz’s exact line: “in reality, the only [reason that matters] is latency.” Two thinkers, two layers of the stack, one binding variable, the same week. “Sometimes you gotta run before you can walk,” Tony Stark told Obadiah Stane in 2008, and the line lands twice — once as Stark’s wisecrack about prototype iteration, and once as the actual operating principle of the second quarter of 2026. The bottleneck didn’t go away. It moved. Yesterday’s Lucy piece removed the limiter at the org chart. Today’s news shipped the rig that runs on the other side of it. Four layers. One operator. I am Iron Man.


The Mark I, you’ll recall, gets built in a cave. Tony Stark is a captive in Afghanistan, a piece of shrapnel inching toward his heart, a workbench made of scavenged Stinger parts and a magnetic chest plate that is keeping him alive ten minutes at a time. He builds the suit not because it is the optimal vehicle for force projection. He builds it because it is the only thing he can build with what is in front of him — and because, with it on, he can walk out of the cave. The Mark I is ugly. The Mark I works. Yinsen, watch your six.

The Ascend COO in yesterday’s Lucy issue was the Mark I. One operator. A senior judgment that already existed. A stack of Claude Code slash commands assembled in the dark. Twenty million ARR walked into a cave with no growth channel; thirty-eight percent ARR growth walked out, on a magnetic chest plate of /daily-ad-review and /new-campaign and /weekly-growth-report. What AI unlocks is not capability the operator did not have. It is capacity the operator could not previously deploy because building the deployment vehicle was the whole job. Tony Stark did not become a better engineer in that cave. He removed the constraint between his existing judgment and the work that needed to happen, and he walked.

Tonight’s piece is the Mark IV. The cycle today shipped the architecture that replaces the cave workbench with a real workshop. Four layers, one operator, the same logic at every layer: find the slowest piece, optimize around it, and the rest compounds. The four layers are the input layer (how the operator talks to the suit), the agent layer (the subsystems the suit runs autonomously), the intelligence layer (which model handles which call), and the output layer (how the suit’s results come back through the operator’s eyeball). Every layer that landed today named the same constraint: latency. Every layer’s news pointed at the same conclusion: the operator is the harness, the suit is the rig, and the slowest piece of the suit is the binding piece.

And — because the cycle has a sense of timing — the same Tuesday that put the four layers on the table also put Sam Altman on a witness stand and let an attorney named Steven Molo ask him Cap’s question. Big man in a suit of armor. Take that off, what are you? The Altman answer is the cautionary back half of this piece. The Stark answer is the front half. Both matter. The suit does not redeem you. Whether you can take it off and still answer Cap honestly does.

Let’s walk the layers.

🥽 The Console — Mira Murati and the Two-Hundred-Millisecond Loop

The JARVIS interface is the part of the suit that talks to Stark. It does not control the suit; Stark controls the suit. It does not make the decisions; Stark makes the decisions. What it does — and what makes it indispensable — is carry the cognitive overhead of every subsystem at the speed of human attention. Stark says “give me the diagnostics,” and the diagnostics arrive in voice, in HUD overlay, in a graph he can poke. He does not type a query into a chat window. He does not wait three seconds for a buffering spinner. He talks; the suit talks back; the loop closes inside a single beat of human cognition.

That is what Mira Murati shipped today. Thinking Machines Lab, her post-OpenAI lab, has been on a public information diet for fourteen months. Today the diet ended. Interaction Models — a research preview, the lab’s first product — is a real-time human-AI system that ingests voice, video, and text in 200-millisecond chunks, perceives the world in a streaming loop, and keeps talking while a second background model runs the slower reasoning, the tool calls, the multi-step search. It is, structurally, exactly the dual-loop architecture Stark’s helmet runs in Iron Man 2: a fast loop that handles the immediate interaction, and a slow loop that handles the analysis the operator does not have time to wait for. The system reacts to visual changes. It will count your reps. It will translate live as a conversation unfolds in two languages. The 200ms unit is not a throughput benchmark. It is the perceptual threshold below which human conversation feels real.

The strategic position of this announcement matters more than the demo. The rest of the AI industry has spent twelve months racing toward “agentic-first” — long-horizon, autonomous, multi-step delegation. Claude Code’s /goal command, released this week, will run an autonomous task loop with no human intervention. Cognition’s Devin (more on Devin in a moment) is at $445M ARR running unattended. Murati’s bet is that the field is solving the wrong frontier. The agentic race optimizes for unattended capability over time. Murati is optimizing for attended capability inside the latency budget of human perception. These are not the same product. They are not even the same market. They are complementary halves of a hybrid that nobody has named yet, and Murati just took the position with no major competitor on it.

The point is operational, not philosophical. Some work is naturally serial. You think out loud. You react to feedback in real time. You demonstrate a physical movement and want the system to mirror it. You ideate in a session where each prompt depends on the previous answer and the rhythm matters. That is the phone call. Real-time voice with a sub-perceptual latency budget is the optimal interface for serial work, and the entire AI industry has been building text-and-agent products that handle serial work clumsily. Other work is naturally parallel. You launch three campaigns. You watch six metrics. You delegate a task that will take forty minutes and pick up another while it runs. That is the office visit with the boss. You bring multiple items because the boss can batch you across them.

The honest answer to “which interface wins” is both, on different tasks, on the same operator’s desk. Omar Ismail at Ascend built /daily-ad-review as the parallel layer — agents that run in the background, watch the metrics, batch the work. He would absolutely use Murati’s Interaction Models for the upstream work: the brand brief, the ICP discovery session, the live walk-through of a sales-call transcript, the strategic conversation where the rhythm of feedback matters and three-second buffering kills the thread. Phone call for inception. Office visit for parallel execution. The 2026 operator is going to live with one foot in each, and the suit is the thing that hands them off without making the operator switch tools.

The differentiation here matters because the voice-input gold rush of the last six months has confused the question. Monologue, the dictation tool from the team at Every. Whisper.ai. Whisperflow. A dozen others. Each of these products treats voice as an input layer the way a stenographer treats voice — a faster way to type. The user dictates; the machine transcribes; the transcript is fed into a downstream LLM. The latency improvement on the input side is real. The interaction shape is unchanged. It is still a one-way pipe. You say a thing. The machine waits. The machine writes it down. The machine returns an answer. The clock has been moved, but the conversation has not. Murati’s bet is that this entire framing of voice-as-faster-typing is the wrong product. The 200ms loop is not faster transcription. It is bidirectional real-time — the machine perceives, responds, and adjusts inside the perceptual window of the operator. The dictation products are word processors with a microphone. The Interaction Model is the helmet’s voice channel.

The Murati announcement is small as news. Research preview, no API pricing yet, no general access, no major enterprise pilot. The Murati announcement is large as positioning. She just put the first major lab flag in the ground on the half of the operator’s day that everyone else is treating as a wait state. Two hundred milliseconds is not the unit of a chatbot. It is the unit of a JARVIS console.

Why this matters for you: The operator’s tool stack in 2026 is no longer “ChatGPT or Claude.” It is Murati for serial work, an agent harness for parallel work, and the meta-skill of knowing which task wants which interface. If your team is using a single AI tool for both ideation and execution, you are running serial workflows through a parallel pipe (slow, frustrating) and parallel workflows through a serial pipe (wasteful, fatigue-inducing). Spend an hour this week mapping which of your three most important AI-assisted workflows is naturally serial and which is naturally parallel, and start picking the right interface for the right task. The Murati interface alone is a research preview. The discipline of choosing the interface to match the task is a skill you can build today on the products you already have.

🛡️ The Subsystems — Idira, UiPath, AWS, and Bezos’s COO Arriving in Code

The second layer of the suit is the part that does the work without bothering the operator. The repulsors. The flight stabilizers. The targeting computer. The shoulder-mounted micro-missiles. Stark does not aim each one individually. He commands them through JARVIS, and JARVIS commands them through a stack of subsystems that have their own identity, their own permissions, their own audit trail, their own envelope of allowed action. When a subsystem does something stupid — say, a Hammer Industries drone goes haywire and starts firing at a crowd — the operator wants to know which subsystem did it, what permission it had, and how to revoke that permission without taking the whole suit offline. Subsystem identity is non-negotiable. Subsystem orchestration is non-negotiable. Subsystem governance is non-negotiable. In the Marvel universe, JARVIS provides all three. In 2026, the cycle just shipped the equivalent.

The single best Aligned editorial framing of the day was buried inside a section blurb that nobody else will quote: “Agent sprawl is the top new enterprise concern as AI agents proliferate without shared infrastructure.” That is the entire problem. Omar Ismail at Ascend was one COO with a personal stack of slash commands. Multiply that by ten thousand employees in a Fortune 500 and you have ten thousand orphan agents running with no identity, no payment authority, no audit trail, no shared orchestration, and no rules about what they can spend or send or sign on the company’s behalf. That is not a productivity miracle. That is a swamp. The cycle this week shipped the swamp drainage equipment. Palo Alto Networks launched Idira today — the first identity platform built for AI agents, not for humans, on the explicit recognition that traditional IAM was designed for an entity that walks into the office and badges in. Agents do not badge in. They spawn. They need their own identity primitive. Idira is the credential plumbing the swamp needs.

UiPath, in a parallel move, opened its enterprise orchestration layer to any coding agent — Claude Code and Codex going first, but the door is now open. The bet is that orchestration, not the agent itself, is the durable enterprise AI automation value. This is the same logic the cloud era already proved at the IaaS layer: the layer that durably captures value is the one that handles failover, retry, observability, and cross-vendor portability. UiPath, like AWS before it, is positioning itself to be the orchestrator that does not care which specific agent ran the job, only that the job got logged, monitored, retried on failure, and audited end-to-end. The agent is the product. The orchestrator is the moat.

Anthropic Claude Platform reached general availability on AWS on Sunday with native IAM authentication, native AWS billing integration, and Managed Agents as a first-class object inside the AWS console. The story dropped quietly. The implication did not. Anthropic just stopped being a separate vendor at the enterprise integration layer and became a native cloud primitive. The IT director who used to negotiate a separate Anthropic contract through procurement, fight for a separate billing relationship, and run a separate identity provider for the Claude console — that IT director just got fifteen hours of their month back. What used to be a vendor-to-vendor integration is now an AWS service. Aligned’s amplification told the story: 952 likes across seven curated lists by sundown. The day’s most-amplified event was not the GPT-5.5 leak or the Cerebras IPO filing. It was a quiet GA announcement that completed the cloud-ification of the frontier lab.

Even the regulators showed up today. Joint cybersecurity agencies — CISA in the U.S., ASD in Australia, NCSC in the U.K. — published the first major government guidance on secure adoption of agentic AI. The guidance, read in the spirit of yesterday’s Groundhog Day and Vector Two thesis, says the quiet part out loud: agents that spend money, send messages, modify systems, or act on behalf of a user are now a national-security category, and the operating model that worked for SaaS does not work for them. The government did not write the joint guidance because the government suddenly got smart about AI. It wrote the guidance because the operational base of agent deployment is now wide enough that uncoordinated agents are starting to constitute a systemic risk.

The Bezos parallel here is the one that lands hardest, and the one nobody is naming. At an early Amazon, Bezos was famously full of ideas — too many ideas to ship. Jeff Wilke, the SVP who would go on to run Amazon’s worldwide consumer business, pulled him aside roughly a year into their working relationship and delivered the line that has been making the rounds on X this week: “Jeff, you have enough ideas to destroy Amazon.” Not because the ideas were wrong. Because the company could not absorb them at the rate they were being introduced, and a company that tries to ship its CEO’s full idea bandwidth ships nothing at all. Wilke’s actual operating principle is the part that lifts cleanly into 2026: “You have to release the work at the right rate that the organization can accept it.” Unreleased work is not a backlog. Unreleased work is adding no value and creating distraction. The bottleneck, Wilke told Bezos, is not the idea generator. It is the organization’s intake rate. The slowest layer of the deployment chain rate-limits everything else. That sentence is Amdahl’s Law applied to product development. It is also the entire four-layer suit thesis in a single Amazon SVP’s clean English. Wilke was the deployment governance layer that converted Bezos’s bandwidth into shipping software. Wilke was the COO function before anyone called it the COO function.

In May 2026, Bezos’s COO arrived as a buyable infrastructure stack. Palo Alto Idira is the identity COO. UiPath is the orchestration COO. AWS IAM is the permissions COO. The joint cybersecurity agencies are the policy COO. Together, they are the operating system for an enterprise that wants its individual operators to be Omar Ismail — and wants the institutional version of “no, we cannot ship that one yet” running underneath. The miracle of yesterday’s Ascend case study was that one COO did the constraint function and the deployment function. The trade-off of one COO is that you cannot do it at the scale of ten thousand operators with one COO and one stack of slash commands. You need the stack to be the COO. This week is the week the stack became the COO.

There is a contrarian read of agent sprawl that we have not seen anyone else file yet and that follows directly from the Aligned framing. *If interoperability across larger organizations is the binding constraint on AI deployment — if every node where one person’s agents touch another person’s agents is a potential point of failure — then the optimal-size organization in 2026 is one that minimizes those nodes.* A solopreneur has zero interop boundaries inside the company. Two of them have one. A ten-person agency has roughly forty-five. A thousand-person enterprise has roughly five hundred thousand. The combinatorial math is brutal in exactly the direction that the news cycle just made expensive. The cost of governing N agents in a sprawl scenario scales not with N but with N². The advantage of being small in the digital economy of 2026 is no longer “we are scrappy and underdog.” It is “we have fewer interop boundaries, fewer audit surfaces, fewer orchestration failure modes, and we ship faster than the enterprise can buy the Idira-UiPath-IAM stack that lets it match our cadence.” The Ascend case study is the existence proof for what one operator with the four-layer suit can do. The Aligned warning about agent sprawl is the existence proof for what an enterprise has to do to even compete. The Aligned warning is N². The Ascend case is N=1. The N² problem does not have a software fix; it has a structural fix, and the structural fix is being smaller.

To be precise about where this argument lives. It applies to the digital part of the economy. A solopreneur with a Claude Max plan is not putting Walmart out of business. Walmart has 4,700 stores, a refrigerated supply chain, a private-label brand, and a hundred thousand truck-and-trailer assets. Physical dependencies are still moats. But a solopreneur with the four-layer suit absolutely is putting a forty-person marketing agency, a twenty-person consulting practice, a fifteen-person software shop, and a ten-person creative studio under structural pressure. The digital part of the economy compounds at the cadence of the operator’s suit. The physical part of the economy compounds at the cadence of the operator’s logistics. The Q2 2026 question for every services-economy company between five and a thousand people is whether being that size is now a liability rather than an asset. That is not a question most boards have asked. It is the question that the agent-sprawl headline of the week is implicitly putting on the table.

Why this matters for you: The order of operations for an organization that wants to deploy AI across more than one team is now non-negotiable: identity first, orchestration second, application third. Most enterprises got the order reversed in 2024 and 2025 — they bought the application (a Copilot, an agent, a CRM integration) and tried to retrofit identity and orchestration on top. That is how you get agent sprawl. The right move in Q2 2026 is to pause the third-tier agent rollouts and stand up the identity-and-orchestration layer first. The Idira/UiPath/AWS-IAM stack is buyable today. The agents you deploy on top of it next quarter will compound. The agents you deployed without it last quarter are an audit risk you are about to have to triage anyway.

⚙️ The Power Source — Tomasz Tunguz, Localmaxxing, and the Camry in the Garage

The third layer is the suit’s power source. The arc reactor in Stark’s chest is what powers everything — repulsors, flight, JARVIS itself. It does not need to be the largest possible energy source. It needs to be the one with the right power-to-weight ratio for the job in front of it. The Mark I arc reactor is a chest plate. The Mark IV arc reactor is a vibranium-cored briefcase device. The Mark XLII arc reactor is a sub-dermal implant. Each one is matched to the form factor of the suit and the duty cycle of the operator. A nuclear submarine reactor would also power the suit. It would also make the suit unflyable.

This is what Tomasz Tunguz is talking about in Localmaxxing, which dropped to your inbox on Sunday night and which is the cleanest operator-grade piece written this month on the model-selection question. Tunguz, a venture capitalist who runs Theory Ventures, spent five weeks deliberately routing his daily AI workload to a local 35B-parameter model — Qwen 3.6 35B-A3B-4bit, running on a MacBook Pro M5 — and benchmarked the output against Claude Opus 4.5 via API. The headline result: half of his 1,471 tasks over five weeks could be completed correctly on the local model. Email and Inbound, Scheduling, Summarization, and Admin alone accounted for 41.8 percent of his volume and ran on the local model with no degradation. Market Research and Engineering split roughly 50/50 between simple tasks (data lookups, script fixes) that ran fine locally and complex tasks (multi-source synthesis, architectural decisions) that needed the frontier model. That gets him to 50 percent of total workload on local hardware.

The benchmark Tunguz published is the part to lift verbatim into your operations doc. Eight agentic tasks, identical prompts, both models warmed. Qwen 3.6 35B-A3B-4bit on his laptop versus Opus 4.5 over the API. “The local model isn’t smarter,” Tunguz writes. “Opus 4.5 scores ~20% higher on reasoning benchmarks. Local models lag frontier by 3-4 months, and for large-scale complex tasks, that gap matters. But for routine agent tasks, it rarely does.” Opus wins on structure and polish — bullet points, headers, cleaner code. Qwen wins on brevity, often half the tokens. Both completed the tasks correctly. For an agent task where the output feeds into the next system in a pipeline, terseness is a feature. And the only reason that actually matters in the choice: latency. The local model is roughly 2x faster on tasks that fit its capability envelope. “If half the work runs 2x faster on my laptop, I’ll take that trade every time.”

This is the Ferrari-and-Camry decision applied to the intelligence layer of the suit. Imagine getting 85 percent of the performance of a Ferrari for 20 percent of the cost — at twice the speed, with the engine sitting in your garage instead of a remote API endpoint that may rate-limit you on a Tuesday afternoon. That is the trade Tunguz priced for himself, and the trade is not exotic. It is the same trade Iron Man makes every time he leaves the workshop with the Mark suit appropriate to the duty cycle. You do not deploy the Hulkbuster Mark XLIV to interview a witness at a coffee shop. You do not deploy the stealth Mark VII to fight Thanos in space. The wrong arc reactor for the job is a strategic mistake, not a budget question. A team that runs every workflow on Opus or GPT-5.5 is paying frontier prices and absorbing API-rate-limit risk on tasks that a 35B Qwen would have closed at twice the speed.

The reason this lands as a four-layer-stack story rather than a model-selection footnote is what word Tunguz used. The privacy argument is real. The cost argument is real. The asset-depreciation argument — your MacBook depreciates whether you use it or not — is real. Tunguz waves all three off. The only one that matters, in his words, is latency. That word — that exact word, that exact constraint, that exact framing — is the binding variable Murati picked when she chose 200 milliseconds as the unit of her input layer. Two thinkers, opposite ends of the stack, same week, named the same constraint as the one that actually drives the design decision. Neither cited the other. Neither had to. The constraint is downstream of human perception, and the operator is downstream of the constraint.

The implication for the operator stack is direct. Your model-selection policy is now part of your suit’s specification, not part of your procurement contract. The Tunguz-style audit — what fraction of your daily AI workload can move to a local Camry without degrading output, and what fraction genuinely needs the cloud Ferrari — is the kind of one-page document a competent COO writes in an afternoon. It does not require an enterprise procurement cycle. It requires an honest internal audit. And it produces the second compounding advantage of the four-layer suit: the operator who routes the right intelligence to the right task ships faster, cheaper, and with fewer rate-limit failures than the operator who runs everything on frontier.

Why this matters for you: Tunguz’s 50/50 number is a hypothesis specific to one VC’s workload mix. Your actual number — what fraction of your team’s daily AI calls could run on a local or open-source model without degrading output — is probably between 30 and 70 percent, and you do not know which yet. Spend a week categorizing your AI calls into Tunguz’s seven buckets (Email & Inbound, Scheduling, Market Research, Summarization, Engineering, Admin, and Other) and benchmark the eight most common ones head-to-head on a local 30-40B model versus your cloud provider. The number you find is your team’s actual ratio. The cost savings are real but secondary. The latency improvement is the compounding advantage. Every workflow that drops from a six-second cloud roundtrip to a two-second local response gets used three times more often by your team, because friction is fatal to adoption and speed is the antidote.

👁️ The HUD — From Markdown Walls to Interactive Surfaces

The fourth layer of the suit is how the suit’s output reaches the operator. Stark’s HUD inside the helmet does not present him with a 4,000-word wall of text describing the tactical situation. It overlays the data on the world in front of him. Heart rate of the hostage in the next room. Weapon serial numbers on the hostile fire team. Suit power remaining. Distance to the rescue chopper. The HUD’s design assumption is that the human eyeball is the slowest piece in the entire chain, and the suit must therefore deliver information in the form the eyeball is fastest at consuming. Visual overlays. Color-coded states. Spatial relationships. Pictures, not paragraphs.

This is the half of the suit specification that AI companies have, until very recently, refused to take seriously. The default output of every LLM is a wall of markdown. Bullet points. Headers. Numbered lists. The walls scroll. Nobody reads them. The human brain is a visual processing engine with a hard limit at roughly ten lines of consecutive text before attention collapses, and the AI industry has spent two years asking the slowest part of the human IO bus to absorb the entire output. Picture is worth a thousand words. Movie is worth a thousand pictures. Every operator who has ever filed a slide deck instead of a memo already knows this. The chatbot UI is the memo. The next-generation operator surface is the slide deck — except interactive.

This is what the .md-to-HTML thread that has been bubbling on the right corners of X is actually about. Markdown is the artifact of LLM output from the first inning of AI — a structured-text format optimized for the AI’s training pipeline, not for the operator’s eyeball. HTML is the next-inning output: interactive, visual, poke-able, sortable, filterable, re-runnable. The operator does not want to read a 2,000-word report on their customer pipeline. They want a live dashboard where they can click any cell and ask a follow-up question, sort by a different column, drill down into one row’s history, and re-run the analysis with different parameters. That is not a memo. That is a small custom app. The output layer of the suit, in 2026, is going to evolve from “the LLM wrote you a paragraph” to “the LLM built you a tool you can hold in your hand.”

The Manthan Gupta piece on voice agent memory architecture, today’s highest-scored research piece in our intake, tells the engineering version of the same story. Memory in voice agents cannot live in the response critical path. It has to be pre-loaded, pre-computed, or written after the fact. Why? Because the response critical path has a 200-millisecond budget — the same number Murati picked for the input layer — and the human IO bus does not tolerate buffering. The slowest piece of the chain has to be moved off the critical path or the entire interaction breaks. That is not a voice-agent rule. That is the entire optimization principle of the four-layer suit. Move the slowest piece off the critical path. Where you cannot move it, optimize it. Where you cannot optimize it, parallelize it. The slowest piece rate-limits everything else.

Apply that principle backward through the stack and the picture lines up. Murati optimizes the input layer to sub-perceptual latency. The agent layer (Idira, UiPath, AWS IAM) moves the identity-and-orchestration overhead off the operator’s critical path and into background subsystems. The intelligence layer (Tunguz Localmaxxing) routes the work to the model whose latency matches the task. The output layer (the move from markdown to interactive HTML) optimizes for the human eyeball’s actual bandwidth. Every layer is solving the same problem at a different point in the stack. Find the slowest piece in your operator’s workflow. Move it, optimize it, or parallelize it. Repeat at the next layer. Compounding does the rest.

Why this matters for you: If your team is producing AI-generated reports as markdown walls or PDFs, you are shipping the equivalent of a typewritten memo in 1995 when the rest of the company was already on PowerPoint. The next quarterly review your team prepares should be an interactive surface — a live document, a poke-able dashboard, a small custom tool — not a static deliverable. The tooling to do this is mature and shipping (Claude Artifacts, ChatGPT Canvas, custom HTML generation in any modern LLM workflow). The skill required is not technical. It is editorial: what is the one question the audience will want to ask after they see this artifact, and is the artifact built to answer it without requiring a follow-up email? Build for the eyeball. The eyeball is the slowest piece. The operator who optimizes for it wins.

🦇 The Vampires Already Learned to Fly It

Marc Andreessen named the operators who already learned to fly the suit, in a phrase that is going to be on every newsletter Wednesday morning, and the phrase is AI vampires. Not the programmers who got replaced by AI. The ones who didn’t. The ones who picked up Codex, Claude Code, and the rest of the AI coding tools and actually used them. Those programmers got stronger. More capable. More productive. Like a vampire who feeds and grows rather than fades. The framing is borrowed and a little melodramatic, the way Andreessen-canon usually is. The data point that makes the framing more than rhetoric is buried two stories away, in the Cognition Devin number that surfaced into Aligned’s evening highlights tonight. Cognition’s Devin is at $445 million ARR, with usage doubling every eight weeks, and the customer list now includes the U.S. Army and Goldman Sachs.

Run the math out one year. Eight-week doubling over 52 weeks is roughly 2^(52/8) = 2^6.5 ≈ ninety-times. Cognition will not actually 90x in twelve months — adoption curves bend, customers churn, the doubling cadence will slow as it matures. But even at half that rate, the compounding gap between the operator who is fluent on the suit and the operator who is not opens to a 15-to-50x range over twelve months. The Andreessen framing as Twitter color is interesting. The Andreessen framing paired with the Devin compounding number is strategic. The gap between your vampire and your non-vampire is not a productivity delta. It is a compounding curve. Twelve months at 8-week doubling separates the people who can take the suit out of the workshop from the people who cannot. It also separates the companies that figured this out in Q2 2026 from the ones who did not.

Microsoft’s 2026 Work Trend Index, which we covered at length in yesterday’s Lucy issue, is the corroborating dataset. Nineteen percent of the workforce is in Microsoft’s “Frontier” tier — skilled workers in ready organizations. Ten percent is “Blocked” — skilled workers stuck in companies that cannot use what they can do. Half is “Emergent” middle. The rest is “Catching Up.” The Frontier tier is the vampire tier. The Blocked tier is the vampire who works at the wrong company. The Emergent middle is the population the Q2 2026 cycle is targeting — the operators who could become vampires if their company stood up the identity/orchestration/intelligence/output stack the cycle just shipped. The people who learned to fly the suit are now compounding. The people who have not are now decoupling from the compounding curve. This is not a productivity gap. It is a divergence.

The actionable read is sharp. Hiring decisions in 2026 are no longer annual gates. They are quarterly bets on the slope of the compounding curve. If your vampire engineer is shipping at 2x today and doubles every 8 weeks, the gap to your non-vampire engineer at 13 weeks is 4x, at 26 weeks is 16x, at 52 weeks is the kind of number you cannot bridge by hiring more non-vampires. You have to either (a) buy the vampire at a market-clearing price, (b) build the vampire by retraining the engineer on the suit, or (c) accept that you are running a non-vampire shop in a vampire market and price your strategy accordingly. Most companies are going to spend Q3 negotiating against (a), failing to execute (b), and pretending they are not running (c). The CFO conversation in Q4 will be about whether the company has a vampire problem or a vampire opportunity. That conversation is already overdue at most mid-market firms.

Why this matters for you: Look at your current hiring pipeline. For each open seat, ask: does the person we are likely to hire know how to fly the four-layer suit, or are we hiring for a 2024 skill profile? The four-layer suit changes the answer to “what is a senior engineer worth” because the senior engineer who is fluent on Claude Code, has a model-routing policy, runs orchestration through UiPath or equivalent, and produces interactive artifacts is not a 2x engineer. They are a 10-to-50x engineer over a one-year horizon. The market-clearing price for that profile is rising faster than your comp bands. The non-vampire seat you fill in May 2026 at 2024 comp is a marginal hire. The vampire seat you fail to fill in May 2026 is a structural one. Get your hiring team trained on what to assess. The interview question that matters now is “walk me through the last AI-assisted workflow you built and which model you routed it to and why.” If the candidate cannot answer, they are not flying the suit yet. They may be teachable. They are not yet a vampire.

🎭 Take Off the Suit — Sam Altman on the Witness Stand

“Big man in a suit of armor. Take that off, what are you?” The line is Steve Rogers, in The Avengers (2012), on the helicarrier deck, after Bruce Banner has nearly Hulked out and Stark and Rogers are circling each other like the alpha males they both are. Stark’s answer lands because it is the most honest thing he says in the entire MCU run up to that point. “Genius, billionaire, playboy, philanthropist.” Three of those are bravado. One of those is true. Stark, in that beat, names what he is without the suit, and his self-awareness is the reason the line works. He concedes the bullshit. Cap walks off muttering “a hero like you,” and Stark grins, because Stark knows.

This morning, in a federal courthouse in Oakland, Steven Molo, Elon Musk’s lead counsel in Musk v. Altman, opened his cross-examination of Sam Altman with Cap’s question. “Are you completely trustworthy?” He then read into the record the testimony of Altman’s former OpenAI board members and senior leadership — Ilya Sutskever’s Monday testimony describing a “pattern of lying” most directly — and asked Altman whether it bothered him that the people who knew him best had publicly testified, under oath, that he lied. Altman said yes. Then Molo asked the question the trial existed to ask. Sam, do you always tell the truth? And Altman’s answer is going to be on every newsletter Wednesday morning, and it deserves to be on every newsletter, because it is the exact opposite of Stark’s answer to Cap.

“I’m sure there are some times in my life when I did not.”

That sentence is the Clinton parsing move with the calendar pushed forward thirty years. Concede the categorical impossibility, refuse the specific instance. Nobody always tells the truth. Everybody has lied sometime. Yes, Counselor, I have lied. To whom and about what is a different question and you will need to file a separate motion. It is the deposition equivalent of “it depends on what the definition of is is.” And it lands on the public record next to Sutskever’s $7 billion stake, Greg Brockman’s journals, Mira Murati’s now-vindicated departure to build Thinking Machines Lab (yes — that Thinking Machines Lab, the one that opened this very piece), Emmett Shear’s seventy-two hours, Satya Nadella’s read-in-court 2022 email — “I don’t want to be IBM and OpenAI to be Microsoft” — and the entire architecture we have been writing about since Whose Side Is Sam Altman On? on April 28. Altman did not concede the bullshit. Altman insisted on the honesty. And in 2026, with five hundred and twenty-five ex-employees who walked away with $8.3 million each in the October 2024 tender and seventy-five who walked away with the full $30 million cap, insisting on the honesty in court is the move that has the camera lingering on your face after you finish.

The Stark move concedes what is real. I am a billionaire. I am a playboy. I am a philanthropist on Mondays. The performance Stark gives Cap is honest about the performance. The Altman move denies what is real. I am an honest and trustworthy businessperson, he said in the next breath, and the sentence sits on the docket next to Sutskever, Brockman, Murati, Shear, Nadella’s email, and the Anthropic void-stock-sale notice we covered in last night’s Lucy issue. The contrast is not subtle. Stark’s answer made the suit more trustworthy because he didn’t pretend the man inside it was. Altman’s answer made the suit less trustworthy because the man inside it kept insisting on a virtue the room had already documented otherwise.

The reason this matters in a piece about the four-layer operator stack is that the suit does not redeem the operator. The CO/AI thesis on the operator stack — the entire spine of this issue — is that the operator’s job is to assemble Murati-grade input, Idira-grade subsystems, Tunguz-grade intelligence routing, and HTML-grade output into a rig that lets them ship at vampire cadence. That assembly does not change the operator’s character. A non-trustworthy operator inside an excellent rig is still a non-trustworthy operator with the additional capability to do non-trustworthy things faster. The trial today put one of the most public and powerful operators in the AI industry on the witness stand and tested whether, with the suit off, he could give Cap a Stark-grade answer. The transcript will get parsed by every legal commentator on AI Twitter for the next forty-eight hours. The transcript already gave the answer.

There is a charitable read. Altman is on a witness stand. His lawyers told him to deny categorically, parse where pressed, and refuse to give Molo any quote that becomes a chyron on the evening news. “I’m sure there are some times in my life when I did not” is exactly that quote, but no version of an honest answer in a deposition is going to leave Altman cleaner than the version his lawyers prepared him for. There is also a less charitable read. Stark gave Cap the honest answer because Stark, in the script, had nothing to lose by giving it. Altman has $30 million per tender employee, an IPO calendar, and a board that already burned and re-hired him inside a single weekend in November 2023. The suit is the asset. Taking it off is not an option for the operator at the top of the trade.

But the suit is the asset for us too, dear reader, in the much smaller version of the question that the cycle is actually putting in front of you this week. Omar Ismail at Ascend can take his suit off at the end of the day. The Frontier-tier employee at the Microsoft-segmented Fortune 500 can take their suit off at the end of the day. You can take your suit off at the end of the day. The question Stark gives Cap a clean answer to — take that off, what are you? — is the question the four-layer stack does not answer for you. It cannot answer it for you. It can only answer it through you. The operator inside the suit is still the operator. The Camry under the hood does not pick the route. You do.

Why this matters for you: Pick the operator culture before you pick the operator stack. The companies that are going to win the second half of 2026 are not the companies with the best four-layer rigs. They are the companies where the people inside the rig can give Cap an honest answer. The board conversation in Q3 about “what is our AI strategy” is the wrong conversation. The board conversation in Q3 about “what is the standard of conduct of the people we are about to hand the four-layer rig to” is the conversation that decides whether your company is the one that compounds or the one that ends up on a witness stand five years from now with a $30 million tender on the table and an attorney named Steven Molo on the lectern. The suit does not redeem you. The character of the operator wearing it is the only redemption that holds up under cross.

🧬 The Accidental Polymath — Why the Synthesis Beats Either Half

There is a story investor Cyan Banister told on a podcast this week, surfaced into wider circulation by Mario Gabriele’s tweet today, and it deserves to land in this piece because it is the human-grade proof point for everything we have been building toward. Bees were dying. The headlines were dire. The conventional response from the technology industry was the Harvard RoboBees project — if the pollinators are going extinct, we will build mechanical ones to replace them. The actual solution came from a different direction entirely. A mycologist — almost certainly Paul Stamets, whose Reishi-and-Chaga-extract-for-bees research was published in Scientific Reports in 2018 — noticed unusual bee behavior during a walk through a forest. He developed a treatment from wood-rotting fungi that boosted bee immunity to the viruses driving the colony collapse. He was not an apiologist. He was a mushroom expert who had the wrong-discipline pattern recognition for the right problem at the right moment. Cyan’s larger point — the one that pays off the four-layer suit thesis with the force of a closing argument — is that AI makes this kind of cross-domain solving the norm rather than the exception. Her line, lifted verbatim because it cannot be improved: “when the knowledge gap between disciplines closes from decades to hours, the fluid dynamics researcher stumbles into a physics breakthrough; the ecologist becomes the apiarist.” And her conclusion: “The polymaths win — not because they know everything, but because they have the tools to connect what specialists cannot.”

That is the operating model the four-layer suit makes possible at the population level. AI alone has a hard time tying threads together across domains; it does not know what to notice. Humans alone have a hard time covering enough domains to notice across them; we run out of years. The synthesis — a senior human with thirty years of pattern recognition in one or two fields, plus a four-layer suit that handles the rest — is the only structure on the planet right now that produces this category of output at scale. It is the same operating principle running through every issue we have written since Warp Speed, Fast And Slow in April. It is the operating principle of this very piece. No AI on its own pulls together the Murati 200ms timing, the Tunguz latency framing, the Wilke quote, the Avengers helicarrier scene, the Endgame finger-snap, the Devin doubling cadence, the Aligned agent-sprawl blurb, the Homebrew-Computer-Club analog, Cyan Banister’s mycologist, and Sam Altman’s deposition into a single thesis with a four-layer architecture and an actionable closing. No human writes the same piece in 90 minutes from scratch either. The synthesis is the product. And the synthesis is what the operator with the four-layer suit is going to be capable of producing, across whatever domain their pattern recognition happens to live in, for the next decade.

This is the version of the Bostrom argument that lands without the dystopia. Yes, the operator becomes more capable than the operator alone has ever been. Yes, the suit makes the operator into a polymath in whatever direction their curiosity points. No, the operator does not retire and let the suit handle civilization. The polymath is the harness. The polymath is the one who decides which threads to pull and which to leave alone. The polymath is the one who knows what to notice — the way Stamets knew to notice a bee on a piece of wood-rotting fungus. The suit fills in. The Stark answer to “what are you without the suit” — for the population the four-layer rig is about to manufacture — is “a domain expert in one field with the curiosity to walk into a hundred others, and a JARVIS that turns the curiosity into output.”

Future Proof Pod Episode 6

Episode 6 – Why the AI Distribution Revolution Will Decide Future Market Leaders

Most companies are paralyzed by the “Fog of War” in AI — a relentless storm of product hype, unpredictable breakthroughs, and fleeting models. But the real game-changer isn’t just the technology; it’s how you navigate the chaos and turn distribution into your ultimate moat.

🌅 The Big Retirement — Bostrom Wants You to Never Take It Off

The closer is a closer because the timing is too perfect to ignore. On the same Tuesday that Mira Murati shipped the input-layer suit, that Palo Alto shipped the identity-layer COO, that Tomasz Tunguz priced the intelligence-layer Camry, that Manthan Gupta wrote the operator’s guide to the output layer, and that Sam Altman gave Bill Clinton’s answer to Steve Rogers’ question — on that same Tuesday, Nick Bostrom circulated a working paper titled “Optimal Timing for Superintelligence: Mundane Considerations for Existing People.” The man who founded AI existential risk as an academic field — the original author of Superintelligence: Paths, Dangers, Strategies — just changed sides.

The argument, in his words via Futurism‘s coverage: a small chance of AI annihilating all humans might be worth the risk, because advanced AI might relieve humanity of “its universal death sentence.” The Aligned framing of the same paper called it the great retirement of humanity. The Bostrom paper itself uses the Big Retirement. The thesis is that we should build superintelligence as fast as we can because, on a long enough timeline, even the worst plausible outcome of building it is preferable to the certain outcome of not building it — namely, that every human alive today eventually dies of natural causes without ever having had the option of medical immortality. Bostrom describes himself in the new paper as “a fretful optimist.” The man who wrote the book on AI doom now writes that failure to develop superintelligence would itself be a catastrophe.

The frame that lands hardest, for an operator reading this piece, is the implied bargain. Let AI run things. Let it deliver the Big Retirement. Stop worrying about whether the operator inside the suit can take the suit off. The suit is now the civilization, and the suit will be more capable than any operator could ever be. It is the philosophical version of the Altman deposition strategy. Concede the categorical impossibility — yes, we might go extinct — in order to refuse the specific instance — and that’s worth it because the upside is so large. It is also the philosophical version of the agentic-first race that Murati just publicly counter-positioned against. The Bostrom argument and the agentic-first racing argument arrive at the same destination from different vehicles: take the operator out of the loop. Let the suit fly itself. The operator was the limiting reagent. Remove the reagent. Optimize the system.

The Stark answer to that argument is the one the entire Marvel arc was written to deliver. Stark in Endgame, picking up the Infinity Stones into the gauntlet, looks at Thanos and says — and the timing of this line, on the day Bostrom dropped his paper, is too good to leave on the cutting room floor — “and I am Iron Man.” Then he snaps his fingers and dies. The suit does not save him. The operator inside the suit chooses what the suit is for. Stark could have let the suit fly itself. Stark could have let JARVIS take the gauntlet. Stark could have outsourced the snap to an agentic-first subsystem with a /goal command and a sub-perceptual latency budget. He did not. Stark made the operator decision, paid the operator price, and the trade closed.

That is the answer to Bostrom. The Big Retirement is the offer to never take the suit off. It is the offer to outsource the operator’s responsibility upward into a system that does not need the operator’s character, the operator’s judgment, the operator’s willingness to give Cap a Stark-grade answer when asked. The Bostrom bargain is the Altman parsing move, scaled to civilization. Concede the categorical risk in order to refuse the specific responsibility. Let the suit be the civilization. Retire the operator.

The CO/AI answer is the Stark answer. Build the four-layer suit. Fly it well. And remember that the suit is an instrument of the operator, not a replacement for them. The operator in 2026 is the harness. The suit is the rig. The Camry runs the inbox; the Ferrari runs the synthesis; the Idira identity-layer keeps the subsystems honest; the Murati interface keeps the human in the loop at the speed of human cognition. And at the end of the day, the operator takes the suit off, looks Cap in the eye, and gives the honest answer. I am Iron Man. Not because the suit makes me Iron Man. Because I am the one who chooses what the suit is for.

📁 What This Means For You

The four-layer suit is buyable today. The decision in front of every operator and every operator’s board this week is which of the four layers to optimize first, and the answer is the layer where your slowest piece currently lives. If your team is bottlenecked on ideation cycles and live conversation, prioritize the Murati-grade input layer. If your team is bottlenecked on agent sprawl and audit liability, prioritize the Idira-grade identity-and-orchestration layer. If your team is bottlenecked on cloud-API latency and frontier model cost, prioritize the Tunguz-grade Localmaxxing intelligence-routing audit. If your team is bottlenecked on output that nobody reads, prioritize the HTML-over-markdown output layer. Pick the slowest piece. Optimize there. Compound.

The hiring decision is now a slope decision. The non-vampire seat you fill at 2024 comp is a marginal hire; the vampire seat you fail to fill is a structural one. The compounding curve is real, the Andreessen framing is the Twitter version of it, the Cognition Devin doubling-every-8-weeks is the price tag. Build your interview rubric around four-layer fluency.

The character question is the unsolved one. The four-layer stack does not make a non-trustworthy operator into Tony Stark. It makes them faster at being who they already are. The board conversation about culture and standard of conduct is now coupled to the board conversation about AI strategy, in a way it was not coupled even six months ago. The Altman deposition is a reminder, not an indictment. The suit does not redeem the operator. The operator’s character is the only thing that does, and you cannot retrofit it after the IPO.

🧠 Three Questions We Think You Should Be Asking Yourself

  1. Of the four layers — input, agent, intelligence, output — which is the slowest piece in your current AI workflow, and what is the buyable upgrade you could ship by July 1? (If you cannot answer, that is the answer. The audit is the first move.)
  2. What fraction of your team’s daily AI calls genuinely need a frontier model, and what is the latency improvement your team would see if half of them moved to a local 35B? (You do not know yet. Tunguz didn’t either until he ran the experiment.)
  3. If your most senior AI-fluent employee were on a witness stand tomorrow morning and Steven Molo asked them whether they always tell the truth, what answer would you want on the transcript? (And does your culture currently produce that answer, or the Altman one?)

The suit is the rig. The operator is the harness. The character is the only thing that doesn’t compound — and the only thing that has to.

— Harry & Anthony

Cross-references: This piece builds on the editorial line developed in Warp Speed, Fast And Slow (April 17), Mind The Gap (April 20), Brutalist (April 23), Distribution Is The Moat (April 26), Speed Eats Scale (April 27), Whose Side Is Sam Altman On? (April 28), AI Heat (April 30), Porsche In The Driveway (May 3), I Drink Your Milkshake (May 4), Anthropic, OpenAI And The Name Of The Game (May 5), No One Set Off My Evil Detector (May 6), What Would You Say You Do Here (May 7), Groundhog Day (May 10), and Lucy (May 11). The Iron Man frame will return. The four-layer suit is the spine of the next twelve months of operator coverage.

Past Briefings

May 11, 2026

Lucy

THE NUMBER: 38 and 0 — ARR growth at one mid-market portfolio company over six months, and the number of growth hires required to produce it. The COO of Ascend (formerly FlyFlat), Omar Ismail, walked into a $20M ARR premium travel concierge with 650+ clients and roughly 95 percent of revenue coming from word-of-mouth. Six months later, January was Ascend's best month on record — $27.6M ARR. ROAS at month two ran ~5x, projecting 8-10x as pipeline matures. Cost per Meta lead $42-45. MQL→booked-call rate 48.7 percent. Bessemer published the full Atlas case study this afternoon. The entire growth engine...

May 10, 2026

Groundhog Day

THE NUMBER: 8 days — the gap between two deterministic Linux root exploits this past week. Copy Fail (CVE-2026-31431) was disclosed on April 29. Dirty Frag (CVE-2026-43284) was disclosed on May 7, and its discoverer was explicit that he had built it on the bug class Copy Fail introduced. Two root primitives, eight days apart, the second engineered on top of the first by a human researcher armed with the same kind of LLM tooling that found the first. The 90-day disclosure window the security industry has been running on since the early 2000s was built for a world where...

May 7, 2026

What Would You Say… You Do Here?

THE NUMBER: 3.5x — what 95th-percentile firms now consume in AI per worker compared to typical firms, per OpenAI's first enterprise B2B Signals report released yesterday. That ratio was 2x a year ago. Twelve months from now it's 5x. "What... would you say... you DO here?" That's Bob Slydell in Office Space, sitting across a folding conference table from Peter Gibbons, asking the question every consulting firm gets paid to ask and most CEOs are too polite to. It's also the question OpenAI just answered with a chart. In 2026 the answer "I use Claude" is not the right answer....