CRLGJul 11, 2023

Differential Analysis of Triggers and Benign Features for Black-Box DNN Backdoor Detection

arXiv:2307.05422v215 citationsh-index: 41
Originality Incremental advance
AI Analysis

This work addresses security vulnerabilities in AI systems by providing a practical detection method for backdoor attacks, though it is incremental as it builds on existing metrics and novelty detection techniques.

The paper tackles the problem of detecting backdoor attacks in deep neural networks under black-box conditions by introducing five metrics to quantify the influence of triggers versus benign features, achieving effective detection with a data-efficient approach using only a tiny clean validation dataset.

This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics. To calculate the five-metric values for a given input, we first generate several synthetic samples by injecting the input's partial contents into clean validation samples. Then, the five metrics are computed by using the output labels of the corresponding synthetic samples. One contribution of this work is the use of a tiny clean validation dataset. Having the computed five metrics, five novelty detectors are trained from the validation dataset. A meta novelty detector fuses the output of the five trained novelty detectors to generate a meta confidence score. During online testing, our method determines if online samples are poisoned or not via assessing their meta confidence scores output by the meta novelty detector. We show the efficacy of our methodology through a broad range of backdoor attacks, including ablation studies and comparison to existing approaches. Our methodology is promising since the proposed five metrics quantify the inherent differences between clean and poisoned samples. Additionally, our detection method can be incrementally improved by appending more metrics that may be proposed to address future advanced attacks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes