LGDec 16, 2020

TrojanZoo: Towards Unified, Holistic, and Practical Evaluation of Neural Backdoors

Ren Pang, Zheng Zhang, Xiangshan Gao, Zhaohan Xi, Shouling Ji, Peng Cheng, Xiapu Luo, Ting Wang

arXiv:2012.09302v415.046 citationsHas Code

Originality Highly original

AI Analysis

This platform addresses the lack of unified evaluation benchmarks for neural backdoor attacks and defenses, which is a critical problem for researchers and practitioners in deep learning security.

The paper introduces TrojanZoo, an open-source platform for evaluating neural backdoor attacks and defenses in deep learning systems, specifically in computer vision. It incorporates 8 attacks, 14 defenses, 6 attack metrics, and 10 defense metrics, revealing complex trade-offs among attack effectiveness, evasiveness, and transferability, and offering insights into improving existing methods.

Neural backdoors represent one primary threat to the security of deep learning systems. The intensive research has produced a plethora of backdoor attacks/defenses, resulting in a constant arms race. However, due to the lack of evaluation benchmarks, many critical questions remain under-explored: (i) what are the strengths and limitations of different attacks/defenses? (ii) what are the best practices to operate them? and (iii) how can the existing attacks/defenses be further improved? To bridge this gap, we design and implement TROJANZOO, the first open-source platform for evaluating neural backdoor attacks/defenses in a unified, holistic, and practical manner. Thus far, focusing on the computer vision domain, it has incorporated 8 representative attacks, 14 state-of-the-art defenses, 6 attack performance metrics, 10 defense utility metrics, as well as rich tools for in-depth analysis of the attack-defense interactions. Leveraging TROJANZOO, we conduct a systematic study on the existing attacks/defenses, unveiling their complex design spectrum: both manifest intricate trade-offs among multiple desiderata (e.g., the effectiveness, evasiveness, and transferability of attacks). We further explore improving the existing attacks/defenses, leading to a number of interesting findings: (i) one-pixel triggers often suffice; (ii) training from scratch often outperforms perturbing benign models to craft trojan models; (iii) optimizing triggers and trojan models jointly greatly improves both attack effectiveness and evasiveness; (iv) individual defenses can often be evaded by adaptive attacks; and (v) exploiting model interpretability significantly improves defense robustness. We envision that TROJANZOO will serve as a valuable platform to facilitate future research on neural backdoors.

View on arXiv PDF Code

Similar