Recent testing shows DeepSeek hallucinates much more than competing models

New AI model's push for better reasoning reveals accuracy trade-offs as DeepSeek shows three times more mistakes than previous version.

Written by CO/AI Bot

Published on February 7th, 2025 9:56 AM

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

A new AI reasoning model from DeepSeek has been found to produce significantly more false or hallucinated responses compared to similar AI models, according to testing by enterprise AI startup Vectara.

Key findings: Vectara’s testing revealed that DeepSeek’s R1 model demonstrates notably higher rates of hallucination compared to other reasoning and open-source AI models.

OpenAI and Google’s closed reasoning models showed the lowest rates of hallucination in the tests
Alibaba’s Qwen model performed best among models with partially public code
DeepSeek’s earlier V3 model, which served as the foundation for R1, showed three times better accuracy than its successor

Technical context: AI hallucination refers to when AI models generate false or made-up information while appearing to provide accurate responses.

The issue stems from problems in the fine-tuning process rather than the reasoning capabilities themselves
Fine-tuning requires careful balance to maintain multiple capabilities while enhancing specific features
According to Vectara’s head of developer relations Ofer Mendelevitch, DeepSeek will likely address these issues in future updates

Independent verification: Recent testing by Wired writer Reece Rogers corroborates Vectara’s findings about DeepSeek’s accuracy issues.

Rogers identified both hallucination and moderation problems during his evaluation
Questions remain about the training data used to develop the model
Despite these issues, Rogers suggested DeepSeek could be a significant competitor to U.S.-based AI companies

Looking ahead: While DeepSeek’s current performance raises concerns about reliability, the broader trend suggests that reasoning models will continue to improve through iterative development and refined training methods. The challenge lies in maintaining multiple capabilities while enhancing specific features like reasoning, highlighting the complexity of developing advanced AI systems.

DeepSeek hallucinates alarmingly more than other AI models

semafor

IAG’s AI system cuts aircraft maintenance planning from weeks to minutes

The system runs millions of daily scenarios to avoid costly grounded aircraft emergencies.

Trump secures China rare earth deal while escalating AI competition

The White House frames dependency on Chinese minerals as an existential threat.

Coatue research reveals AI is creating a “great separation” between winners and losers

High-growth companies command 13x revenue multiples versus 4x for slower growers.

No hype. No doom. Just actionable resources and strategies to accelerate your success in the age of AI.

Join the revolution

AI is moving at lightning speed, but we won’t let you get left behind. Sign up for our newsletter and get notified of the latest AI news, research, tools, and our expert-written prompts & playbooks.

Join our newsletter!

Outsider Labs, Inc. Venice, CA 90291

Menu

Recent testing shows DeepSeek hallucinates much more than competing models

Recent News

IAG’s AI system cuts aircraft maintenance planning from weeks to minutes

Trump secures China rare earth deal while escalating AI competition

Coatue research reveals AI is creating a “great separation” between winners and losers

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

Recent testing shows DeepSeek hallucinates much more than competing models

Recent News

IAG’s AI system cuts aircraft maintenance planning from weeks to minutes

Trump secures China rare earth deal while escalating AI competition

Coatue research reveals AI is creating a “great separation” between winners and losers

Join the revolution

CO/AI

Resources

Join the revolution