Understanding and Improving Ensemble Adversarial Defense
This work addresses the problem of improving adversarial robustness in machine learning models for security-critical applications, offering a novel theoretical and practical approach that is incremental over existing ensemble techniques.
The paper tackled the lack of theoretical understanding for why ensemble adversarial defense improves robustness, developing a new error theory that demonstrates provable 0-1 loss reduction and proposing an interactive global adversarial training (iGAT) method that boosts performance by up to 17% on CIFAR datasets under attacks.
The strategy of ensemble has become popular in adversarial defense, which trains multiple base classifiers to defend against adversarial attacks in a cooperative manner. Despite the empirical success, theoretical explanations on why an ensemble of adversarially trained classifiers is more robust than single ones remain unclear. To fill in this gap, we develop a new error theory dedicated to understanding ensemble adversarial defense, demonstrating a provable 0-1 loss reduction on challenging sample sets in an adversarial defense scenario. Guided by this theory, we propose an effective approach to improve ensemble adversarial defense, named interactive global adversarial training (iGAT). The proposal includes (1) a probabilistic distributing rule that selectively allocates to different base classifiers adversarial examples that are globally challenging to the ensemble, and (2) a regularization term to rescue the severest weaknesses of the base classifiers. Being tested over various existing ensemble adversarial defense techniques, iGAT is capable of boosting their performance by increases up to 17% evaluated using CIFAR10 and CIFAR100 datasets under both white-box and black-box attacks.