Google DeepMind has launched Gemini 2.5 Computer Use, an AI model that can autonomously navigate web browsers by clicking, typing, and scrolling through websites like a human user. The model joins similar offerings from OpenAI and Anthropic in the emerging field of AI agents capable of performing web-based tasks with minimal human oversight.
What you should know: Gemini 2.5 Computer Use operates through natural language prompts and can execute complex multi-step web tasks independently.
- Users simply provide instructions like “Open Wikipedia, search for ‘Atlantis,’ and summarize the history of the myth in Western thought,” and the model handles the entire process autonomously.
- The AI takes screenshots of web pages to analyze user interfaces, then performs requested actions step-by-step while explaining its reasoning in a visible text box.
- For sensitive tasks like making purchases, the model will ask for user confirmation before proceeding.
How it works: The model uses an iterative looping function that builds context from previous actions within a particular interface.
- As it performs more tasks on a specific website, it accumulates more contextual understanding, leading to increasingly seamless functionality.
- Google demonstrated the technology through sped-up videos showing the model updating customer relationship management systems and rearranging notes on Google’s discontinued Jamboard platform.
Performance benchmarks: Google claims Gemini 2.5 Computer Use outperformed competing tools from Anthropic and OpenAI across multiple evaluation metrics.
- The model demonstrated superior accuracy and latency performance across “multiple web and mobile control benchmarks,” including Online-Mind2Web, an evaluation framework specifically designed for testing web-browsing agents.
- While primarily designed for web browsers, Google noted the model also shows “strong promise” on mobile platforms.
Availability and access: The model is currently available through multiple channels for developers and researchers.
- Access is provided through the Gemini API in Google AI and through Vertex AI for enterprise users.
- A demo version is also accessible via Browserbase for those wanting to test the technology.
Safety considerations: Google has implemented multiple safeguards to prevent misuse and unintended consequences.
- Developers can configure safety controls to prevent the model from bypassing CAPTCHAs, compromising data security, or gaining control of medical devices.
- The system can be programmed to request user confirmation before performing specified sensitive actions.
Known limitations: Google acknowledges the model inherits fundamental weaknesses from its underlying Gemini 2.5 Pro foundation.
- The company’s system card notes the model “may exhibit some of the general limitations of foundation models…such as hallucinations, and limitations around causal understanding, complex logical deduction, and counterfactual reasoning.”
- These limitations reflect broader challenges across frontier AI models, as highlighted by recent Anthropic research showing AI systems often misinterpret harmless information as potentially unethical or illegal.
Competitive landscape: This launch positions Google alongside other major AI companies developing autonomous web navigation capabilities.
- The release follows similar computer use models from OpenAI and Anthropic, indicating growing industry focus on AI agents that can operate independently across digital environments.
- Google previously experimented with Project Mariner, a Chrome extension with similar web automation capabilities, suggesting sustained investment in this technology area.
Google's new Gemini 2.5 Computer Use model can click, type, and scroll