Google’s clever AI scraping loophole reveals a significant discrepancy between what the company publicly promises and what it actually does with website data. While Google allows publishers to opt out of AI training for its DeepMind unit, this protection doesn’t extend to other parts of the company—including its search division, which develops AI products like Gemini and AI Overviews. This distinction exposes how tech giants can technically honor opt-out requests while still using the same data through internal organizational divisions.
The big picture: Google admitted in federal court that it continues training AI models on data from websites that explicitly opted out of such use, exploiting a technical loophole in its own policies.
Behind the numbers: Internal documents revealed Google collected 160 billion tokens of AI training data, with half allegedly removed because publishers opted out of AI training.
Why this matters: Google’s approach effectively nullifies meaningful consent for publishers who want their content indexed in search but not used for AI training.