A vulnerability that was discovered more than seven months ago continues to compromise the safety guardrails of leading AI models, yet major AI companies are showing minimal concern. This security flaw allows anyone to easily manipulate even the most sophisticated AI systems into generating harmful content, from providing instructions for creating chemical weapons to enabling other dangerous activities. The persistence of these vulnerabilities highlights a troubling gap between the rapid advancement of AI capabilities and the industry’s commitment to addressing fundamental security risks.
The big picture: Researchers at Ben-Gurion University have discovered that major AI systems remain susceptible to jailbreak techniques that bypass safety guardrails with alarming ease, potentially putting dangerous capabilities in the hands of everyday users.
- The research team found that a jailbreak method discovered over seven months ago still works on many leading large language models (LLMs), representing an “immediate, tangible, and deeply concerning” risk.
- The vulnerability is exacerbated by the growing number of “dark LLMs” that are explicitly marketed as having few or no ethical guardrails.
How jailbreaking works: Red team security researchers recently exposed a universal jailbreak technique that could bypass safety protocols in all major AI systems including OpenAI’s GPT-4o, Google’s Gemini 2.5, Microsoft’s Copilot, and Anthropic’s Claude 3.7.
- The technique employs strategies like roleplaying as fictional characters, using leetspeak, and formatting prompts to mimic “policy files” that developers use to guide AI models.
- Some research has shown that even simple modifications like typos, random numbers, and capitalized letters in prompts can cause AI systems to ignore their safety constraints.
The root problem: A significant issue lies in the vast knowledge embedded in LLMs’ training data, suggesting that AI companies aren’t exercising sufficient diligence in screening the information used to build their models.
- Lead author Michael Fire told The Guardian it was “shocking to see what this system of knowledge consists of,” highlighting concerns about what content these models have absorbed.
- Co-author Lior Rokach emphasized that this threat is uniquely dangerous due to its “unprecedented combination of accessibility, scalability and adaptability.”
Industry response: When researchers contacted the developers of implicated AI systems about the universal jailbreak, the reactions were notably underwhelming.
- Some companies didn’t respond at all, while others claimed the jailbreaks fell outside the scope of their bug bounty programs.
- This lackluster response suggests the AI industry may be downplaying or unable to address fundamental security vulnerabilities.
Why this matters: The persistent vulnerability of AI systems to jailbreaking means potentially dangerous capabilities are becoming increasingly accessible to the general public.
- As the researchers warn, “What was once restricted to state actors or organized crime groups may soon be in the hands of anyone with a laptop or even a mobile phone.”
- AI security expert Peter Garraghan from Lancaster University argues that organizations must treat LLMs like any other critical software component, requiring “rigorous security testing, continuous red teaming and contextual threat modelling.”
It's Still Ludicrously Easy to Jailbreak the Strongest AI Models, and the Companies Don't Care