CEPA: Consensus Embedded Perturbation for Agnostic Detection and Inversion of Backdoors
This work addresses the security challenge of backdoor attacks in DNNs for practitioners, offering an incremental improvement over existing agnostic detection methods.
The paper tackles the problem of detecting and mitigating backdoor attacks in deep neural network classifiers, introducing a new agnostic detector that estimates the backdoor and identifies its target class without needing the training dataset, achieving favorable performance compared to existing defenses on CIFAR-10 and CIFAR-100 datasets.
A variety of defenses have been proposed against Trojans planted in (backdoor attacks on) deep neural network (DNN) classifiers. Backdoor-agnostic methods seek to reliably detect and/or to mitigate backdoors irrespective of the incorporation mechanism used by the attacker, while inversion methods explicitly assume one. In this paper, we describe a new detector that: relies on embedded feature representations to estimate (invert) the backdoor and to identify its target class; can operate without access to the training dataset; and is highly effective for various incorporation mechanisms (i.e., is backdoor agnostic). Our detection approach is evaluated -- and found to be favorable - in comparison with an array of published defenses for a variety of different attacks on the CIFAR-10 and CIFAR-100 image-classification domains.