Google’s Gemini 2.5 Pro sets new reasoning benchmark with 18.8% score

Google’s Gemini 2.5 represents a significant leap in AI reasoning capabilities, positioning the company at the forefront of the competitive AI landscape. With benchmark scores substantially higher than rival systems, this latest model demonstrates Google’s commitment to rapid AI advancement through frequent, meaningful updates. The new version’s enhanced thinking capabilities signal a shift toward AI systems that can tackle increasingly complex problems while supporting more context-aware applications.

The big picture: Google has unveiled Gemini 2.5 Pro Experimental, which it claims is its “most intelligent AI model” yet, featuring substantially improved reasoning capabilities.

The new model combines an enhanced base architecture with improved post-training to achieve what Google describes as “a new level of performance.”
Available immediately to Gemini Advanced subscribers, this release continues Google’s aggressive pace of AI development following recent launches of Gemini Deep Research and updates to NotebookLM.

Key details: Gemini 2.5’s enhanced thinking capabilities will be incorporated into all future Google AI models to tackle more complex problems.

Users can access the experimental model through the Gemini app or Google’s AI Studio, though a Gemini Advanced subscription is required.
Google plans to release additional 2.5 models in the future, with pricing for scaled production use to be announced in the coming weeks.

Impressive benchmarks: The new model scored 18.8% on Humanity’s Last Exam, significantly outperforming competitors’ AI systems on this challenging benchmark.

This score represents the highest ever achieved on this demanding test without tool use, surpassing ChatGPT’s o3-mini (14%) and DeepSeek R1 (8.6%).
Google characterizes Gemini 2.5 Pro’s reasoning capabilities as “state-of-the-art,” a claim supported by these benchmark results.

Why this matters: Google’s rapid succession of AI updates demonstrates its commitment to maintaining competitive advantage in the increasingly crowded AI development space.

The significant improvement in reasoning capabilities could translate to more capable AI assistants for both consumer and enterprise applications.
The company’s focus on reasoning suggests a shift toward AI systems that can handle more nuanced, complex tasks rather than simply generating content.

Google’s Gemini 2.5 Pro sets new reasoning benchmark with 18.8% score

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development