×
ULMFit, not GPT-1, was the first true LLM according to new analysis
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The development of Large Language Models (LLMs) has fundamentally transformed AI capabilities, but understanding their origins helps contextualize today’s rapid advancements. While GPT-4 and Claude dominate current discussions, identifying the first true LLM clarifies the evolutionary path of these increasingly sophisticated systems and provides valuable perspective on how quickly this technology has developed in just a few years.

The big picture: According to Australian tech blogger Jonathon Belotti, ULMFit, published by Jeremy Howard in January 2018, represents the first true LLM, predating OpenAI’s GPT-1 by several months.

  • GPT-1, created by Alec Radford, was published on June 11, 2018, several months after ULMFit’s introduction.
  • Both models demonstrated the core capabilities that define modern LLMs, though at a far smaller scale than today’s systems.

What makes an LLM: Belotti defines an LLM as a language model effectively trained as a “next word predictor” that can be easily adapted to multiple text-based tasks without architectural changes.

  • The definition emphasizes self-supervised training on unlabeled text data, focusing on next-word prediction capabilities.
  • True LLMs must achieve state-of-the-art performance across multiple text challenges with minimal adaptation.
  • This definition helps distinguish early language models from true LLMs based on their capabilities and adaptability.

Historical context: The article examines several pre-2018 models like CoVE and ELMo to determine if they meet the criteria for being considered LLMs.

  • After analysis, Belotti concludes that while arguable, ULMFit most convincingly fits the definition of the first genuine LLM.
  • Earlier models lacked either the adaptability or performance characteristics that define modern LLMs.

Where we go from here: Despite the increasing multimodality of AI models, Belotti suggests the term “LLM” will likely persist in technical vernacular.

  • Like “GPU” (which originally stood for Graphics Processing Unit but now handles many non-graphics tasks), “LLM” may become a standard term even as models evolve beyond pure language processing.

Recent News

Musk-backed DOGE project targets federal workforce with AI automation

DOGE recruitment effort targets 300 standardized roles affecting 70,000 federal employees, sparking debate over AI readiness for government work.

AI tools are changing workflows more than they are cutting jobs

Counterintuitively, the Danish study found that ChatGPT and similar AI tools created new job tasks for workers and saved only about three hours of labor monthly.

Disney abandons Slack after hacker steals terabytes of confidential data using fake AI tool

A Disney employee fell victim to malware disguised as an AI art tool, enabling the hacker to steal 1.1 terabytes of confidential data and forcing the company to abandon Slack entirely.