Researchers at the University of Missouri have developed PEARL, an AI system that uses large language models to detect hardware trojans in computer chips with up to 97% accuracy. While this represents a significant advancement in securing the global chip supply chain, experts warn that the remaining 3% margin for error could still allow catastrophic vulnerabilities to slip through in critical systems like defense networks and medical equipment.
What you should know: Hardware trojans are malicious alterations secretly embedded during chip manufacturing that can remain dormant until activated to steal data or cause device failures.
- These threats can be inserted at nearly any stage of the complex global supply chain, where design, testing, and assembly often involve multiple firms across different countries.
- Once deployed, trojans in compromised chips can affect everything from data centers to medical equipment and defense systems.
- Detection and removal costs are substantial, and severe cases can force companies to recall entire product lines.
How PEARL works: The system applies large language models including GPT-3.5 Turbo, Gemini 1.5 Pro, Llama 3.1, and DeepSeek-V2 to identify malicious code in chip designs.
- PEARL uses in-context learning techniques—zero-shot, one-shot, and few-shot strategies—to detect trojans in Verilog code (the programming language used to design computer chips) without requiring training from scratch.
- The system provides human-readable explanations describing why specific code sections were classified as malicious, improving transparency for engineers.
- Unlike traditional methods, PEARL operates without needing a “golden model” (a clean reference chip for comparison), enabling broader practical application.
Key performance metrics: Enterprise LLMs demonstrated superior detection capabilities compared to open-source alternatives when tested across industry-standard benchmarks.
- GPT-3.5 Turbo achieved up to 97% accuracy in detecting previously unknown hardware trojans.
- Open-source models like DeepSeek-V2 reached approximately 91% accuracy rates.
- Testing was conducted using Trust-Hub and ISCAS 85/89 datasets, standard benchmarks in the hardware security field.
Why the remaining 3% matters: Even minor detection gaps could have devastating consequences given chips’ critical role in essential infrastructure.
- A single undetected trojan could compromise financial networks, national defense operations, or life-supporting medical devices.
- The sophistication of emerging trojans continues to evolve, making perfect detection increasingly challenging.
- High-stakes industries require additional layers of manual verification and testing beyond AI-driven detection alone.
What the researchers acknowledge: The study’s authors recognize that achieving perfect trojan detection remains unattainable with current technology, particularly as threat actors develop more sophisticated attack methods.
New AI model spots dangerous chip code with near-perfect accuracy