CR CVJul 16, 2024

UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening

Siyuan Cheng, Guangyu Shen, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Hanxi Guo, Shiqing Ma, Xiangyu Zhang

arXiv:2407.11372v17.34 citationsh-index: 28Has Code

Originality Highly original

AI Analysis

This addresses the vulnerability of DNNs to advanced backdoor attacks, providing a more effective and cost-efficient mitigation method for security-critical applications.

The paper tackles the problem of backdoor attacks in deep neural networks by introducing UNIT, a post-training defense technique that eliminates backdoor effects by approximating and tightening activation distributions, outperforming 7 existing methods against 14 attacks with only 5% clean data.

Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent advanced attacks. In this paper, we introduce a novel post-training defense technique UNIT that can effectively eliminate backdoor effects for a variety of attacks. In specific, UNIT approximates a unique and tight activation distribution for each neuron in the model. It then proactively dispels substantially large activation values that exceed the approximated boundaries. Our experimental results demonstrate that UNIT outperforms 7 popular defense methods against 14 existing backdoor attacks, including 2 advanced attacks, using only 5\% of clean training data. UNIT is also cost efficient. The code is accessible at https://github.com/Megum1/UNIT.

View on arXiv PDF Code

Similar