×
DeepSeek’s $294K AI model becomes first to pass peer review
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

DeepSeek’s AI model R1 has become the first major large language model to undergo peer review, with researchers publishing details in Nature revealing the reasoning-focused system cost just $294,000 to train. The landmark study provides unprecedented transparency into how the Chinese startup created a model that rivals OpenAI’s offerings at a fraction of the cost, potentially reshaping expectations around AI development expenses and accessibility.

What you should know: The peer-reviewed paper confirms DeepSeek’s innovative approach to creating powerful AI without relying on competitor outputs.

  • R1 excels at reasoning tasks like mathematics and coding, competing directly with US-developed models while costing substantially less to develop.
  • The model has been downloaded 10.9 million times on Hugging Face, making it the most popular open-weight model on the platform.
  • Training costs totaled approximately $300,000, plus $6 million for the base model—far below the tens of millions typically spent on rival systems.

Technical breakthrough: DeepSeek used pure reinforcement learning rather than human-selected examples to teach R1 reasoning strategies.

  • The automated trial-and-error approach rewarded correct answers without prescribing specific reasoning tactics.
  • The model learned to verify its own work using a technique called group relative policy optimization, boosting efficiency by scoring its own attempts.
  • This method allowed R1 to develop reasoning-like strategies independently, rather than copying human-prescribed approaches.

In plain English: Instead of teaching the AI how to think by showing it examples of human reasoning, DeepSeek let the AI figure out its own problem-solving methods through trial and error—like letting a student discover the best study techniques by experimenting rather than forcing them to follow a specific textbook approach.

Industry impact: The model has influenced virtually all reinforcement learning research in large language models throughout 2025.

  • “Almost all work in 2025 so far that conducts reinforcement learning in LLMs might have been inspired by R1 one way or another,” says Huan Sun, an AI researcher at Ohio State University.
  • Other researchers are now applying DeepSeek’s methods to improve existing models and extend reasoning capabilities beyond mathematics and coding.
  • Lewis Tunstall from Hugging Face, an AI community platform, describes R1 as having “kick-started a revolution” in AI reasoning approaches.

Addressing controversy: DeepSeek researchers explicitly denied training R1 on OpenAI model outputs, countering speculation that emerged after the model’s January release.

  • Media reports suggested OpenAI, the San Francisco-based company behind ChatGPT, believed DeepSeek had used their models’ outputs to accelerate R1’s development.
  • While R1’s base model was trained on web data that may include AI-generated content, researchers stated they didn’t copy reasoning examples from OpenAI models.
  • Independent replication attempts by other labs support DeepSeek’s claims that pure reinforcement learning alone can achieve high performance.

Setting precedent: R1 represents the first major language model to undergo rigorous peer review, establishing a new standard for AI transparency.

  • “This is a very welcome precedent,” says Tunstall, who reviewed the Nature paper, noting the importance of public evaluation for assessing AI system risks.
  • The peer-review process led to reduced anthropomorphizing in descriptions and added technical clarifications about training data and safety measures.
  • Researchers hope other AI firms will follow suit with similar transparency practices.

Performance validation: Despite lower development costs, R1 remains highly competitive in scientific and reasoning tasks.

  • In ScienceAgentBench challenges involving data analysis and visualization, R1 ranked among the best models for balancing capability with cost-effectiveness.
  • The model was trained primarily on Nvidia H800 chips, which became restricted from sale to China under 2023 US export controls.
  • Current evidence suggests pure reinforcement learning can achieve very high performance without requiring access to competitor models.
Secrets of Chinese AI Model DeepSeek Revealed in Landmark Paper

Recent News

Moody’s flags risks in Oracle’s massive $300B AI infrastructure bet

Most of the half-trillion-dollar revenue hinges on OpenAI's continued success.

Hong Kong to deploy AI in 200 public services by 2027

A new AI Efficacy Enhancement Team will guide the ambitious digital transformation effort.

AI slashes spine modeling time from 24 hours to 30 minutes

Digital spine twins can now predict surgical complications before doctors pick up a scalpel.