Study finds AI agents complete just 3% of real freelance tasks

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

New research testing six leading AI agents on real freelance work reveals these automated systems can barely complete 3% of assigned tasks, earning just $1,810 out of a possible $143,991 in simulated projects. The study by the Center for AI Safety, a nonprofit research organization, and Scale AI, a major data annotation company, exposes a massive gap between AI industry promises and actual performance, suggesting that widespread job automation remains far from reality despite aggressive corporate adoption.

What you should know: The Remote Labor Index benchmark tested AI agents across diverse real-world freelance projects spanning game development to data analysis.

China-based startup Manus performed best with only a 2.5% automation rate, meaning it could acceptably complete just 2.5% of assigned projects.
Elon Musk’s Grok 4 and Anthropic’s Claude Sonnet 4.5 tied for second place at 2.1%, despite Claude being marketed as the “best coding model in the world.”
OpenAI’s GPT-5, touted for “PhD level” intelligence, managed just 1.7% completion rate.

The performance rankings: Even the most advanced AI models struggled dramatically with basic freelance tasks.

ChatGPT Agent, OpenAI’s dedicated AI agent tool, barely reached 1.3% completion.
Google’s Gemini 2.5 Pro performed worst at 0.8%, demonstrating the industry-wide challenge.
No AI agent exceeded 3% task completion across any category tested.

Why this matters: Companies are aggressively replacing human workers with AI despite mounting evidence that automation isn’t delivering promised productivity gains.

One MIT study found 95% of companies piloting AI initiatives saw no meaningful revenue growth.
Research shows AI tools often create “workslop”—low-quality output requiring extensive human revision that creates workplace tension.
Many executives who fired employees for AI have been forced to rehire them after discovering the technology’s limitations.

The big picture: AI agents face fundamental technical barriers that prevent effective job replacement.

“They don’t have long-term memory storage and can’t do continual learning from experiences. They can’t pick up skills on the job like humans,” CAIS director Dan Hendrycks explained.
The gap between AI marketing claims and real-world performance suggests current automation capabilities are vastly oversold.
Despite these findings, AI-related layoffs continue accelerating across industries.

What they’re saying: Researchers emphasize the importance of realistic AI capability assessments.

“I should hope this gives much more accurate impressions as to what’s going on with AI capabilities,” Hendrycks told Wired.
“We have debated AI and jobs for years, but most of it has been hypothetical or theoretical,” noted Scale AI’s director of research Bing Lie.

A New Paper Tested AI's Ability to Do Actually Online Freelance Work, and the Results Are Damning

Futurism

Menu

Study finds AI agents complete just 3% of real freelance tasks

Recent News

Canva launches AI design model that creates editable layers, not flat images

Nvidia CEO: You’ll lose your job to AI-savvy colleagues, not AI