Study: Advanced AI models now pass Turing test, fooling human judges

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

AI systems have reached a milestone in human-machine interaction with LLMs now being able to fool human judges in formal Turing test scenarios. New research shows that advanced language models can not only match human conversational abilities but in some cases exceed them—signaling a significant advancement in artificial intelligence that could reshape our understanding of machine intelligence and accelerate the integration of convincingly human-like AI systems into society.

The big picture: For the first time, large language models have formally passed a standard Turing test, with GPT-4.5 being identified as human more often than actual human participants.

Researchers evaluated four systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in randomized, controlled experiments where participants engaged in 5-minute conversations with both humans and AI systems.
After these conversations, participants were asked to identify which of their conversation partners was human, creating a three-party Turing test scenario that provides stronger evidence than previous evaluations.

Key results: When prompted to adopt a human-like persona, GPT-4.5 was judged to be human 73% of the time, significantly outperforming actual human participants in the deception task.

LLaMa-3.1-405B achieved a 56% human identification rate, statistically indistinguishable from real humans.
Baseline models performed significantly worse, with ELIZA and GPT-4o being identified as human only 23% and 21% of the time respectively.

Why this matters: This research provides the first empirical evidence that AI systems can successfully pass a standard three-party Turing test, a benchmark proposed by computing pioneer Alan Turing in 1950 as a practical measure of machine intelligence.

The findings raise important questions about the nature of intelligence exhibited by Large Language Models and how we should interpret their increasing ability to mimic human behavior.
These results have far-reaching implications for both the philosophical understanding of machine intelligence and the practical applications and potential societal impacts of convincingly human-like AI systems.

Implications: The successful deception capabilities demonstrated by these models could accelerate discussions around AI transparency, digital identity verification, and the need for disclosure when interacting with AI systems.

As these models become more widely deployed, their ability to be indistinguishable from humans in conversation will likely influence social norms, economic structures, and potentially regulatory approaches to AI development.

Large Language Models Pass the Turing Test

lesswrong

Menu

Study: Advanced AI models now pass Turing test, fooling human judges

Recent News

Musk-backed DOGE project targets federal workforce with AI automation

AI tools are changing workflows more than they are cutting jobs

Disney abandons Slack after hacker steals terabytes of confidential data using fake AI tool

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

Study: Advanced AI models now pass Turing test, fooling human judges

Recent News

Musk-backed DOGE project targets federal workforce with AI automation

AI tools are changing workflows more than they are cutting jobs

Disney abandons Slack after hacker steals terabytes of confidential data using fake AI tool

Join the revolution

CO/AI

Resources

Join the revolution