LG CROct 26, 2023

CBD: A Certified Backdoor Detector Based on Local Dominant Probability

arXiv:2310.17498v214.927 citationsh-index: 9Has Code

Originality Highly original

AI Analysis

This addresses the security threat of backdoor attacks in deep learning models, offering a certified detection method that is particularly effective for attacks with specific trigger characteristics, though it is incremental in providing certification to existing detection approaches.

The paper tackles the problem of detecting backdoor attacks in deep neural networks by introducing CBD, the first certified backdoor detector based on local dominant probability and adjustable conformal prediction. The result shows that CBD achieves comparable or higher detection accuracy than state-of-the-art methods while providing detection certification, with empirical detection true positive rates up to 100% on benchmark datasets like GTSRB and SVHN.

Backdoor attack is a common threat to deep neural networks. During testing, samples embedded with a backdoor trigger will be misclassified as an adversarial target by a backdoored model, while samples without the backdoor trigger will be correctly classified. In this paper, we present the first certified backdoor detector (CBD), which is based on a novel, adjustable conformal prediction scheme based on our proposed statistic local dominant probability. For any classifier under inspection, CBD provides 1) a detection inference, 2) the condition under which the attacks are guaranteed to be detectable for the same classification domain, and 3) a probabilistic upper bound for the false positive rate. Our theoretical results show that attacks with triggers that are more resilient to test-time noise and have smaller perturbation magnitudes are more likely to be detected with guarantees. Moreover, we conduct extensive experiments on four benchmark datasets considering various backdoor types, such as BadNet, CB, and Blend. CBD achieves comparable or even higher detection accuracy than state-of-the-art detectors, and it in addition provides detection certification. Notably, for backdoor attacks with random perturbation triggers bounded by $\ell_2\leq0.75$ which achieves more than 90\% attack success rate, CBD achieves 100\% (98\%), 100\% (84\%), 98\% (98\%), and 72\% (40\%) empirical (certified) detection true positive rates on the four benchmark datasets GTSRB, SVHN, CIFAR-10, and TinyImageNet, respectively, with low false positive rates.

View on arXiv PDF Code

Similar