back

We’re Building the Agentic Web Faster Than We’re Protecting It

Get SIGNAL/NOISE in your inbox daily

Google’s WebMCP gives agents structured access to every website. Anthropic’s data shows autonomy doubling with oversight thinning. OpenAI’s agent already drains crypto vaults.

Google shipped working code Thursday that hands AI agents a structured key to every website on the internet. WebMCP, running in Chrome 146 Canary, lets sites expose machine-readable “Tool Contracts” so agents can book a flight, file a support ticket, or complete a checkout without parsing screenshots or scraping HTML. Early benchmarks show 67% less compute overhead than visual approaches. Microsoft co-authored the spec. The W3C is incubating it. This isn’t a proposal. It’s production software already running.

At the same time, Anthropic published data from millions of real agent interactions, and the numbers don’t match the demos. Claude Code’s longest autonomous sessions have nearly doubled in three months, from under 25 minutes to over 45. Experienced users have stopped reviewing individual actions. More than 40% now run full auto-approve. On complex tasks, the agent pauses for clarification more often than humans interrupt it. The oversight model isn’t what anyone designed. Agents are running longer and freer than the people deploying them planned.

And while the infrastructure to put agents inside every website is being assembled, OpenAI released a benchmark showing GPT-5.3-Codex can drain vulnerable crypto smart contracts 72% of the time, up from 31.9% six months ago. No instructions, no human guidance. One test had an agent execute a complete flash loan attack in a single transaction, draining an entire test vault. OpenAI framed this as a defensive tool and put $10M in API credits behind security research. The defense isn’t catching up: the best models still identify less than half of all vulnerabilities.

Three developments, one pattern. The infrastructure to put agents inside every website is being built. Autonomy in real deployments is outpacing oversight. Offensive capability is running well ahead of defense in the same environments where agents will operate. We’re building the agentic web faster than we’re protecting it.


Google Gives Agents a Key to Every Website

Until this week, AI agents trying to complete tasks on websites had two bad options: capture screenshots and burn thousands of tokens having a vision model guess what to click, or parse raw HTML and try to reverse-engineer which elements are buttons and what actions they trigger. Both are slow, expensive, and brittle. A single website redesign can break an entire automation workflow.

Google’s answer is WebMCP, shipped Thursday through Chrome 146 Canary. The protocol lets websites publish a “Tool Contract,” a structured manifest of capabilities that agents discover and invoke directly through a new browser API called navigator.modelContext. Instead of navigating a visual interface built for human eyes, an agent calls a function like buyTicket(destination, date). Early benchmarks show 67% less compute overhead than visual agent-browser interactions.

Two integration paths. The Declarative API requires minimal changes, adding metadata tags to existing HTML forms. The Imperative API handles complex workflows through JavaScript. The protocol is already in testing for customer support ticket filing, travel booking, and ecommerce checkout.

Microsoft co-authored the specification, which means Edge support is likely, though no timeline has been announced. The W3C is incubating it through its Web Machine Learning community group, the same path that brought WebAssembly and WebGPU from proposal to standard.

Alex Nahas, WebMCP’s creator and former Amazon backend engineer who previously built Anthropic’s Model Context Protocol, put it directly: “Think of it as MCP, but built into the browser tab.” Instead of requiring separate backend infrastructure, websites advertise their capabilities through the browser where users are already present. Chrome acts as gatekeeper, requiring user approval before agents execute sensitive operations.

WebMCP is designed for collaborative browsing, not headless automation. Fully autonomous scenarios are a stated non-goal; for those, Google points to its separate Agent-to-Agent protocol. The protocol also doesn’t compete with Anthropic’s MCP. MCP connects AI platforms to service providers through backend servers. WebMCP runs client-side in the browser. They solve adjacent problems.

This is the second piece of Google’s agentic web infrastructure. The Universal Commerce Protocol, announced in January, standardized how agents handle shopping. WebMCP handles the layer below it: the fundamental mechanics of how any agent talks to any website.

Key takeaway: If your website or product depends on humans completing actions, it will increasingly depend on agents completing those actions. WebMCP is the standard being laid down right now for how that works. The first companies to implement Tool Contracts won’t just serve human visitors. They’ll be discoverable by every agent acting on their behalf. Early adoption isn’t an experiment anymore. It’s positioning.

Source: Google Ships WebMCP, The Browser-Based Backbone For The Agentic Web — Forbes


OpenAI Built a Tool That Drains Crypto Wallets. It Works 72% of the Time.

Smart contracts are automated vaults. They hold money, follow rules written in code, and have no customer service line. When there’s a bug in the code, someone can drain the vault. It’s irreversible. Over $100 billion in crypto assets sit in contracts right now.

OpenAI, alongside crypto investment firm Paradigm, released EVMbench this week: a benchmark that tests how well AI agents can find, fix, and exploit security vulnerabilities in smart contracts. They also tested their own agent on it. GPT-5.3-Codex scored 72.2% on exploit tasks, meaning it successfully drained funds from vulnerable contracts nearly three-quarters of the time. Six months ago, GPT-5 scored 31.9% on the same tasks.

The case study is harder to ignore than the benchmark score. A GPT-5.2 agent discovered and executed a flash loan attack, a complex multi-step exploit that borrows funds, manipulates a market, and repays the loan in a single transaction. It drained an entire test vault’s balance. No human guidance. No step-by-step instructions.

The defense isn’t close. Detection (finding bugs) and patching (fixing them) are significantly harder than exploitation. The best model identified only ~46% of vulnerabilities. One key finding: give the AI a small hint about where to look, and patch success jumps from 39% to 94%. The bottleneck isn’t skill. It’s search. Attackers know where to look. Defenders don’t.

OpenAI is framing this as a defensive research tool. They’re backing it with $10 million in API credits for cybersecurity researchers, an expanding beta of Aardvark (their AI security research agent), and a new Trusted Access for Cyber program for vetted security professionals. The argument is that defenders need to understand exactly what offensive AI can do before they can build adequate protections.

That argument is legitimate. It’s also not reassuring. The same capability is available to anyone who builds a similar system, and the benchmark makes clear what those systems can do. The race between AI-powered offense and defense is real. Right now, offense is winning by a wide margin.

Why this matters: The crypto market is the first major asset class where AI agents are already operating at scale. What works against vulnerable smart contracts will work against other automated financial infrastructure as it’s built. EVMbench is a preview of what sophisticated attacks will look like in 18 months. If your organization is building AI-powered financial workflows, the question isn’t whether AI will probe those systems. It’s whether your defense is ready when it does.

Source: OpenAI built a crypto thief — The Neuron


Anthropic Studied Millions of Agent Deployments. The Numbers Don’t Match the Pitch.

Anthropic published data from millions of real agent interactions across Claude Code and its public API this week. Read it if you’re running agents in production, because it describes what’s actually happening, not what anyone designed.

The headline number: Claude Code’s longest autonomous sessions have nearly doubled in three months, from under 25 minutes to over 45. The increase is smooth across model releases, which means it isn’t just better capabilities. Existing models were already capable of more autonomy than they were being given. Users are simply letting them run.

The autonomy shift is sharper than that number suggests. New users run roughly 20% of sessions on full auto-approve, meaning no review of individual actions. Experienced users hit 40% and above. That’s not a gradual drift. Experienced users are treating agents the way they treat employees: let them work, intervene when something looks wrong. The oversight model is shifting from review-every-action to trust-but-verify, without anyone explicitly deciding that’s the policy.

One finding cuts against the assumption that agents are running unsupervised: on complex tasks, Claude Code stops to ask for clarification more than twice as often as humans interrupt it. The agent is more cautious than its users. But that’s Claude Code specifically, a product built with explicit intervention points. The pattern across the broader API is less controlled.

Agents are already operating in healthcare, finance, and cybersecurity, though not at scale yet. Software engineering accounts for nearly 50% of agentic activity. Those new domains are where mistakes are hardest to reverse.

Anthropic said plainly that effective oversight will require post-deployment monitoring infrastructure and new human-AI interaction models that don’t yet exist. That’s an admission the current oversight setup isn’t adequate. Coming from the company running the systems, that’s worth taking seriously.

The strategic read: Experienced users are trusting agents more. Sessions are running longer. The domains expanding into are higher-stakes. This is the pattern you’d expect before a significant failure in a production agentic system. If your organization has agents running, the question isn’t whether you have a policy. It’s whether you have monitoring that catches drift before the output is already wrong.

Source: Measuring AI agent autonomy in practice — Anthropic


Tracking

What CEOs Should Be Watching:

THE BOTTOM LINE

The agentic web is being built right now. Not planned. Not piloted. Built.

Google shipped working protocol code that puts AI agents inside every website without scraping or guessing. Microsoft co-authored it. Anthropic’s data shows agents running longer, with less supervision, in higher-stakes domains than anyone officially announced. OpenAI demonstrated that the same coding agents being deployed for software engineering can drain a financial vault in a single transaction.

Prepare for agents, not just AI tools. The shift from AI assistants to agents isn’t a future state. It’s in your pipeline whether your strategy accounts for it or not. Experienced users are at 40% full auto-approve and climbing. Agents in finance and healthcare are already appearing in production data. Your procurement, security, and risk teams need agent-specific policies now, not after the first incident.

Task your security team this week. The EVMbench results aren’t a crypto story. They’re a preview of what AI-powered attacks look like against any automated financial workflow. If you’re building infrastructure that agents will eventually touch, audit it before the offensive agents do. The gap between 72% exploit success and 46% detection is not a rounding error.

Build the monitoring before you need it. Anthropic admitted that effective oversight requires infrastructure that doesn’t exist yet. That’s your window. Organizations building post-deployment monitoring for agent behavior now will have a real advantage when regulators and insurers start asking. They will ask.

The companies treating this week as background noise will still be running point-in-time evaluations when agents are on 24-hour sessions. Don’t be those companies.

Key People & Companies

NameRoleCompanyLink
Alex NahasCreator, WebMCPGoogleX
Sundar PichaiCEOGoogleX
Dario AmodeiCEOAnthropicX
Dylan FieldCEOFigmaX

Sources

  1. Google Ships WebMCP, The Browser-Based Backbone For The Agentic Web — Forbes
  2. OpenAI built a crypto thief — The Neuron
  3. Measuring AI agent autonomy in practice — Anthropic
  4. Agentic AI systems don’t fail suddenly — they drift over time — CIO.com
  5. SaaSpocalypse Now: Claude’s 11 Plugins Triggered A $285B Wipeout — Forbes
  6. Figma Just Hit $304M in a Single Quarter — SaaStr
  7. Microsoft Bug Let Copilot Access Confidential Emails Without Consent — PCMag

Compiled from 299 sources across news sites, X threads, YouTube, and company announcements. Cross-referenced with thematic analysis and edited by Anthony Batt, Harry DeMott and CO/AI’s team with 30+ years of executive technology leadership.

Past Briefings

Apr 13, 2026

The AI Race Just Became a Resource War. Here’s Who Owns the Mine.

THE NUMBER: $4.08 — the hourly rental price for a single Nvidia Blackwell GPU, up 48% from $2.75 in just two months. CoreWeave raised prices 20% and extended contract minimums to three years. For the first time since the early 2000s, the most important resource in AI isn't talent or data. It's electricity and silicon. The companies that own it just took the driver's seat. ⚡ The AI industry spent three years telling you the future was about models. The smartest models. The biggest benchmarks. The most parameters. Turns out the future is about who owns the power plant. Tomasz...

Apr 12, 2026

The Revolution Eats Its Children

THE NUMBER: 85.4% vs. 61.3% — VoxCPM2's voice similarity score versus ElevenLabs on the MiniMax-MLS benchmark. A 24-point blowout. The winner is an open-source model from Tsinghua University with 2 billion parameters, runs on 8GB of VRAM, ships under Apache 2.0, and costs exactly nothing. The loser is valued at $11 billion and charges a monthly subscription. VoxCPM2 doesn't just clone voices — it generates new ones from text descriptions. Describe what you want — "a young woman, gentle tone, slightly slow pace" — and it builds the voice from scratch. No recording needed. No API fee. No permission required....

Apr 9, 2026

Anthropic Built the Plumbing. Meta Built the Cash Register.

THE NUMBER: $0.08 — the cost per session hour for an autonomous AI agent that can work for hours without human intervention. Eight cents for the orchestration layer. But here's the business model that matters: the real revenue is the inference underneath. Every agent session burns tokens — Opus tokens, Sonnet tokens, Haiku tokens — and Anthropic collects on every one. The $0.08 isn't the price. It's the on-ramp. Anthropic just built the cheapest toll road in enterprise software, and every car on it burns their fuel. Yesterday we wrote that the AI house needed plumbing. Then Anthropic showed up...