LGCRCVNov 18, 2025

Certified but Fooled! Breaking Certified Defences with Ghost Certificates

arXiv:2511.14003v1
Originality Highly original
AI Analysis

This work exposes critical vulnerabilities in certified robustness methods, which are incremental but important for security in AI systems.

The paper tackles the problem of bypassing certified defenses in machine learning by crafting imperceptible adversarial perturbations that not only misclassify inputs but also spoof robustness certificates, achieving large certification radii for incorrect classes, as demonstrated with ImageNet and state-of-the-art defenses like Densepure.

Certified defenses promise provable robustness guarantees. We study the malicious exploitation of probabilistic certification frameworks to better understand the limits of guarantee provisions. Now, the objective is to not only mislead a classifier, but also manipulate the certification process to generate a robustness guarantee for an adversarial input certificate spoofing. A recent study in ICLR demonstrated that crafting large perturbations can shift inputs far into regions capable of generating a certificate for an incorrect class. Our study investigates if perturbations needed to cause a misclassification and yet coax a certified model into issuing a deceptive, large robustness radius for a target class can still be made small and imperceptible. We explore the idea of region-focused adversarial examples to craft imperceptible perturbations, spoof certificates and achieve certification radii larger than the source class ghost certificates. Extensive evaluations with the ImageNet demonstrate the ability to effectively bypass state-of-the-art certified defenses such as Densepure. Our work underscores the need to better understand the limits of robustness certification methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes