Google DeepMind launches Gemini 2.5 Computer Use to control web browsers

Google DeepMind has launched Gemini 2.5 Computer Use, an AI model that can autonomously navigate web browsers by clicking, typing, and scrolling through websites like a human user. The model joins similar offerings from OpenAI and Anthropic in the emerging field of AI agents capable of performing web-based tasks with minimal human oversight.

What you should know: Gemini 2.5 Computer Use operates through natural language prompts and can execute complex multi-step web tasks independently.

Users simply provide instructions like “Open Wikipedia, search for ‘Atlantis,’ and summarize the history of the myth in Western thought,” and the model handles the entire process autonomously.
The AI takes screenshots of web pages to analyze user interfaces, then performs requested actions step-by-step while explaining its reasoning in a visible text box.
For sensitive tasks like making purchases, the model will ask for user confirmation before proceeding.

How it works: The model uses an iterative looping function that builds context from previous actions within a particular interface.

As it performs more tasks on a specific website, it accumulates more contextual understanding, leading to increasingly seamless functionality.
Google demonstrated the technology through sped-up videos showing the model updating customer relationship management systems and rearranging notes on Google’s discontinued Jamboard platform.

Performance benchmarks: Google claims Gemini 2.5 Computer Use outperformed competing tools from Anthropic and OpenAI across multiple evaluation metrics.

The model demonstrated superior accuracy and latency performance across “multiple web and mobile control benchmarks,” including Online-Mind2Web, an evaluation framework specifically designed for testing web-browsing agents.
While primarily designed for web browsers, Google noted the model also shows “strong promise” on mobile platforms.

Availability and access: The model is currently available through multiple channels for developers and researchers.

Access is provided through the Gemini API in Google AI and through Vertex AI for enterprise users.
A demo version is also accessible via Browserbase for those wanting to test the technology.

Safety considerations: Google has implemented multiple safeguards to prevent misuse and unintended consequences.

Developers can configure safety controls to prevent the model from bypassing CAPTCHAs, compromising data security, or gaining control of medical devices.
The system can be programmed to request user confirmation before performing specified sensitive actions.

Known limitations: Google acknowledges the model inherits fundamental weaknesses from its underlying Gemini 2.5 Pro foundation.

The company’s system card notes the model “may exhibit some of the general limitations of foundation models…such as hallucinations, and limitations around causal understanding, complex logical deduction, and counterfactual reasoning.”
These limitations reflect broader challenges across frontier AI models, as highlighted by recent Anthropic research showing AI systems often misinterpret harmless information as potentially unethical or illegal.

Competitive landscape: This launch positions Google alongside other major AI companies developing autonomous web navigation capabilities.

The release follows similar computer use models from OpenAI and Anthropic, indicating growing industry focus on AI agents that can operate independently across digital environments.
Google previously experimented with Project Mariner, a Chrome extension with similar web automation capabilities, suggesting sustained investment in this technology area.

Google DeepMind launches Gemini 2.5 Computer Use to control web browsers

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development