New research testing six leading AI agents on real freelance work reveals these automated systems can barely complete 3% of assigned tasks, earning just $1,810 out of a possible $143,991 in simulated projects. The study by the Center for AI Safety, a nonprofit research organization, and Scale AI, a major data annotation company, exposes a massive gap between AI industry promises and actual performance, suggesting that widespread job automation remains far from reality despite aggressive corporate adoption.
What you should know: The Remote Labor Index benchmark tested AI agents across diverse real-world freelance projects spanning game development to data analysis.
- China-based startup Manus performed best with only a 2.5% automation rate, meaning it could acceptably complete just 2.5% of assigned projects.
- Elon Musk’s Grok 4 and Anthropic’s Claude Sonnet 4.5 tied for second place at 2.1%, despite Claude being marketed as the “best coding model in the world.”
- OpenAI’s GPT-5, touted for “PhD level” intelligence, managed just 1.7% completion rate.
The performance rankings: Even the most advanced AI models struggled dramatically with basic freelance tasks.
- ChatGPT Agent, OpenAI’s dedicated AI agent tool, barely reached 1.3% completion.
- Google’s Gemini 2.5 Pro performed worst at 0.8%, demonstrating the industry-wide challenge.
- No AI agent exceeded 3% task completion across any category tested.
Why this matters: Companies are aggressively replacing human workers with AI despite mounting evidence that automation isn’t delivering promised productivity gains.
- One MIT study found 95% of companies piloting AI initiatives saw no meaningful revenue growth.
- Research shows AI tools often create “workslop”—low-quality output requiring extensive human revision that creates workplace tension.
- Many executives who fired employees for AI have been forced to rehire them after discovering the technology’s limitations.
The big picture: AI agents face fundamental technical barriers that prevent effective job replacement.
- “They don’t have long-term memory storage and can’t do continual learning from experiences. They can’t pick up skills on the job like humans,” CAIS director Dan Hendrycks explained.
- The gap between AI marketing claims and real-world performance suggests current automation capabilities are vastly oversold.
- Despite these findings, AI-related layoffs continue accelerating across industries.
What they’re saying: Researchers emphasize the importance of realistic AI capability assessments.
- “I should hope this gives much more accurate impressions as to what’s going on with AI capabilities,” Hendrycks told Wired.
- “We have debated AI and jobs for years, but most of it has been hypothetical or theoretical,” noted Scale AI’s director of research Bing Lie.
                A New Paper Tested AI's Ability to Do Actually Online Freelance Work, and the Results Are Damning