Reddit has caught AI search startup Perplexity red-handed using its content through a clever copyright trap, revealing how the company circumvented Reddit’s attempts to block unauthorized data scraping. The social media platform used a “mountweazel”—a deliberate fake entry designed to catch copiers—by creating a test post accessible only to Google’s search engine, which appeared in Perplexity’s results within hours.
The big picture: Reddit’s lawsuit against Perplexity and three other data scraping companies exposes the murky world of AI training data acquisition, where companies are finding increasingly sophisticated ways to harvest copyrighted content without permission.
How the trap worked: Reddit created a test post that could “only be crawled by Google’s search engine and was not otherwise accessible anywhere on the internet.”
What Reddit alleges: The company argues that “Perplexity’s business model is effectively to take Reddit’s content from Google search results,” then feed it into an AI model and “call it a new product.”
The broader context: The AI industry’s voracious appetite for training data has fundamentally disrupted the traditional web scraping ecosystem.
Reddit’s business strategy: The platform is simultaneously fighting unauthorized scraping while monetizing its own data through licensing deals.
Why this matters: This case highlights the growing tension between AI companies’ need for training data and content creators’ rights to control and monetize their intellectual property, setting up potential precedents for how the industry handles data acquisition going forward.