We're Building the Agentic Web Faster Than We're Protecting It

Google’s WebMCP gives agents structured access to every website. Anthropic’s data shows autonomy doubling with oversight thinning. OpenAI’s agent already drains crypto vaults.

Google shipped working code Thursday that hands AI agents a structured key to every website on the internet. WebMCP, running in Chrome 146 Canary, lets sites expose machine-readable “Tool Contracts” so agents can book a flight, file a support ticket, or complete a checkout without parsing screenshots or scraping HTML. Early benchmarks show 67% less compute overhead than visual approaches. Microsoft co-authored the spec. The W3C is incubating it. This isn’t a proposal. It’s production software already running.

At the same time, Anthropic published data from millions of real agent interactions, and the numbers don’t match the demos. Claude Code’s longest autonomous sessions have nearly doubled in three months, from under 25 minutes to over 45. Experienced users have stopped reviewing individual actions. More than 40% now run full auto-approve. On complex tasks, the agent pauses for clarification more often than humans interrupt it. The oversight model isn’t what anyone designed. Agents are running longer and freer than the people deploying them planned.

And while the infrastructure to put agents inside every website is being assembled, OpenAI released a benchmark showing GPT-5.3-Codex can drain vulnerable crypto smart contracts 72% of the time, up from 31.9% six months ago. No instructions, no human guidance. One test had an agent execute a complete flash loan attack in a single transaction, draining an entire test vault. OpenAI framed this as a defensive tool and put $10M in API credits behind security research. The defense isn’t catching up: the best models still identify less than half of all vulnerabilities.

Three developments, one pattern. The infrastructure to put agents inside every website is being built. Autonomy in real deployments is outpacing oversight. Offensive capability is running well ahead of defense in the same environments where agents will operate. We’re building the agentic web faster than we’re protecting it.

Google Gives Agents a Key to Every Website

Until this week, AI agents trying to complete tasks on websites had two bad options: capture screenshots and burn thousands of tokens having a vision model guess what to click, or parse raw HTML and try to reverse-engineer which elements are buttons and what actions they trigger. Both are slow, expensive, and brittle. A single website redesign can break an entire automation workflow.

Google’s answer is WebMCP, shipped Thursday through Chrome 146 Canary. The protocol lets websites publish a “Tool Contract,” a structured manifest of capabilities that agents discover and invoke directly through a new browser API called navigator.modelContext. Instead of navigating a visual interface built for human eyes, an agent calls a function like buyTicket(destination, date). Early benchmarks show 67% less compute overhead than visual agent-browser interactions.

Two integration paths. The Declarative API requires minimal changes, adding metadata tags to existing HTML forms. The Imperative API handles complex workflows through JavaScript. The protocol is already in testing for customer support ticket filing, travel booking, and ecommerce checkout.

Microsoft co-authored the specification, which means Edge support is likely, though no timeline has been announced. The W3C is incubating it through its Web Machine Learning community group, the same path that brought WebAssembly and WebGPU from proposal to standard.

Alex Nahas, WebMCP’s creator and former Amazon backend engineer who previously built Anthropic’s Model Context Protocol, put it directly: “Think of it as MCP, but built into the browser tab.” Instead of requiring separate backend infrastructure, websites advertise their capabilities through the browser where users are already present. Chrome acts as gatekeeper, requiring user approval before agents execute sensitive operations.

WebMCP is designed for collaborative browsing, not headless automation. Fully autonomous scenarios are a stated non-goal; for those, Google points to its separate Agent-to-Agent protocol. The protocol also doesn’t compete with Anthropic’s MCP. MCP connects AI platforms to service providers through backend servers. WebMCP runs client-side in the browser. They solve adjacent problems.

This is the second piece of Google’s agentic web infrastructure. The Universal Commerce Protocol, announced in January, standardized how agents handle shopping. WebMCP handles the layer below it: the fundamental mechanics of how any agent talks to any website.

Key takeaway: If your website or product depends on humans completing actions, it will increasingly depend on agents completing those actions. WebMCP is the standard being laid down right now for how that works. The first companies to implement Tool Contracts won’t just serve human visitors. They’ll be discoverable by every agent acting on their behalf. Early adoption isn’t an experiment anymore. It’s positioning.

Source: Google Ships WebMCP, The Browser-Based Backbone For The Agentic Web — Forbes

OpenAI Built a Tool That Drains Crypto Wallets. It Works 72% of the Time.

Smart contracts are automated vaults. They hold money, follow rules written in code, and have no customer service line. When there’s a bug in the code, someone can drain the vault. It’s irreversible. Over $100 billion in crypto assets sit in contracts right now.

OpenAI, alongside crypto investment firm Paradigm, released EVMbench this week: a benchmark that tests how well AI agents can find, fix, and exploit security vulnerabilities in smart contracts. They also tested their own agent on it. GPT-5.3-Codex scored 72.2% on exploit tasks, meaning it successfully drained funds from vulnerable contracts nearly three-quarters of the time. Six months ago, GPT-5 scored 31.9% on the same tasks.

The case study is harder to ignore than the benchmark score. A GPT-5.2 agent discovered and executed a flash loan attack, a complex multi-step exploit that borrows funds, manipulates a market, and repays the loan in a single transaction. It drained an entire test vault’s balance. No human guidance. No step-by-step instructions.

The defense isn’t close. Detection (finding bugs) and patching (fixing them) are significantly harder than exploitation. The best model identified only ~46% of vulnerabilities. One key finding: give the AI a small hint about where to look, and patch success jumps from 39% to 94%. The bottleneck isn’t skill. It’s search. Attackers know where to look. Defenders don’t.

OpenAI is framing this as a defensive research tool. They’re backing it with $10 million in API credits for cybersecurity researchers, an expanding beta of Aardvark (their AI security research agent), and a new Trusted Access for Cyber program for vetted security professionals. The argument is that defenders need to understand exactly what offensive AI can do before they can build adequate protections.

That argument is legitimate. It’s also not reassuring. The same capability is available to anyone who builds a similar system, and the benchmark makes clear what those systems can do. The race between AI-powered offense and defense is real. Right now, offense is winning by a wide margin.

Why this matters: The crypto market is the first major asset class where AI agents are already operating at scale. What works against vulnerable smart contracts will work against other automated financial infrastructure as it’s built. EVMbench is a preview of what sophisticated attacks will look like in 18 months. If your organization is building AI-powered financial workflows, the question isn’t whether AI will probe those systems. It’s whether your defense is ready when it does.

Source: OpenAI built a crypto thief — The Neuron

Anthropic Studied Millions of Agent Deployments. The Numbers Don’t Match the Pitch.

Anthropic published data from millions of real agent interactions across Claude Code and its public API this week. Read it if you’re running agents in production, because it describes what’s actually happening, not what anyone designed.

The headline number: Claude Code’s longest autonomous sessions have nearly doubled in three months, from under 25 minutes to over 45. The increase is smooth across model releases, which means it isn’t just better capabilities. Existing models were already capable of more autonomy than they were being given. Users are simply letting them run.

The autonomy shift is sharper than that number suggests. New users run roughly 20% of sessions on full auto-approve, meaning no review of individual actions. Experienced users hit 40% and above. That’s not a gradual drift. Experienced users are treating agents the way they treat employees: let them work, intervene when something looks wrong. The oversight model is shifting from review-every-action to trust-but-verify, without anyone explicitly deciding that’s the policy.

One finding cuts against the assumption that agents are running unsupervised: on complex tasks, Claude Code stops to ask for clarification more than twice as often as humans interrupt it. The agent is more cautious than its users. But that’s Claude Code specifically, a product built with explicit intervention points. The pattern across the broader API is less controlled.

Agents are already operating in healthcare, finance, and cybersecurity, though not at scale yet. Software engineering accounts for nearly 50% of agentic activity. Those new domains are where mistakes are hardest to reverse.

Anthropic said plainly that effective oversight will require post-deployment monitoring infrastructure and new human-AI interaction models that don’t yet exist. That’s an admission the current oversight setup isn’t adequate. Coming from the company running the systems, that’s worth taking seriously.

The strategic read: Experienced users are trusting agents more. Sessions are running longer. The domains expanding into are higher-stakes. This is the pattern you’d expect before a significant failure in a production agentic system. If your organization has agents running, the question isn’t whether you have a policy. It’s whether you have monitoring that catches drift before the output is already wrong.

Source: Measuring AI agent autonomy in practice — Anthropic

Tracking

What CEOs Should Be Watching:

SaaSpocalypse Now: Claude’s 11 Plugins Triggered a $285B Wipeout — Forbes — Claude’s integrated tools (GitHub, Jira, Notion, databases) are cutting into enterprise software revenue that was previously distributed across dozens of vendors. The plugin consolidation story is accelerating and the market cap damage is already visible.
Figma Just Hit $304M in a Single Quarter, Growing 40% — SaaStr — Despite predictions that AI would gut design tool spending, Figma is accelerating. The company will start enforcing monthly AI use limits in March. AI may be expanding markets, not just cannibalizing them.
Microsoft Bug Let Copilot Access Confidential Emails Without Consent — PCMag — A bug in Microsoft Office exposed customers’ confidential emails to Copilot AI without authorization. If you’re running Copilot across your organization, your security team should treat this as a model for what unintended data access looks like at enterprise scale.

THE BOTTOM LINE

The agentic web is being built right now. Not planned. Not piloted. Built.

Google shipped working protocol code that puts AI agents inside every website without scraping or guessing. Microsoft co-authored it. Anthropic’s data shows agents running longer, with less supervision, in higher-stakes domains than anyone officially announced. OpenAI demonstrated that the same coding agents being deployed for software engineering can drain a financial vault in a single transaction.

Prepare for agents, not just AI tools. The shift from AI assistants to agents isn’t a future state. It’s in your pipeline whether your strategy accounts for it or not. Experienced users are at 40% full auto-approve and climbing. Agents in finance and healthcare are already appearing in production data. Your procurement, security, and risk teams need agent-specific policies now, not after the first incident.

Task your security team this week. The EVMbench results aren’t a crypto story. They’re a preview of what AI-powered attacks look like against any automated financial workflow. If you’re building infrastructure that agents will eventually touch, audit it before the offensive agents do. The gap between 72% exploit success and 46% detection is not a rounding error.

Build the monitoring before you need it. Anthropic admitted that effective oversight requires infrastructure that doesn’t exist yet. That’s your window. Organizations building post-deployment monitoring for agent behavior now will have a real advantage when regulators and insurers start asking. They will ask.

The companies treating this week as background noise will still be running point-in-time evaluations when agents are on 24-hour sessions. Don’t be those companies.

Key People & Companies

Name	Role	Company	Link
Alex Nahas	Creator, WebMCP	Google	X
Sundar Pichai	CEO	Google	X
Dario Amodei	CEO	Anthropic	X
Dylan Field	CEO	Figma	X

Sources

Compiled from 299 sources across news sites, X threads, YouTube, and company announcements. Cross-referenced with thematic analysis and edited by Anthony Batt, Harry DeMott and CO/AI’s team with 30+ years of executive technology leadership.

We’re Building the Agentic Web Faster Than We’re Protecting It

Google’s WebMCP gives agents structured access to every website. Anthropic’s data shows autonomy doubling with oversight thinning. OpenAI’s agent already drains crypto vaults.

Google Gives Agents a Key to Every Website

OpenAI Built a Tool That Drains Crypto Wallets. It Works 72% of the Time.

Anthropic Studied Millions of Agent Deployments. The Numbers Don’t Match the Pitch.

Tracking

THE BOTTOM LINE

Key People & Companies

Sources

Outsider
Labs.

We’re Building the Agentic Web Faster Than We’re Protecting It

Google’s WebMCP gives agents structured access to every website. Anthropic’s data shows autonomy doubling with oversight thinning. OpenAI’s agent already drains crypto vaults.

Google Gives Agents a Key to Every Website

OpenAI Built a Tool That Drains Crypto Wallets. It Works 72% of the Time.

Anthropic Studied Millions of Agent Deployments. The Numbers Don’t Match the Pitch.

Tracking

THE BOTTOM LINE

Key People & Companies

Sources

More like this

Cars and Trucks and Things That Go

Coffee’s for Closers

Show Me Where to Put the Fulcrum

All Signal.No Noise.

OutsiderLabs.

All Signal.
No Noise.

Outsider
Labs.