CVCRLGJun 5, 2022

Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training

arXiv:2206.02158v16 citationsh-index: 37
Originality Incremental advance
AI Analysis

This addresses a key problem in adversarial machine learning for improving model security, though it is incremental as it builds on existing adversarial training methods.

The paper tackles the trade-off between accuracy and robustness in adversarial training by proposing Vanilla Feature Distillation Adversarial Training (VFD-Adv), which uses knowledge distillation from a pre-trained model to preserve non-robust but predictive features, achieving improved performance across datasets and models.

Adversarial training has been widely explored for mitigating attacks against deep models. However, most existing works are still trapped in the dilemma between higher accuracy and stronger robustness since they tend to fit a model towards robust features (not easily tampered with by adversaries) while ignoring those non-robust but highly predictive features. To achieve a better robustness-accuracy trade-off, we propose the Vanilla Feature Distillation Adversarial Training (VFD-Adv), which conducts knowledge distillation from a pre-trained model (optimized towards high accuracy) to guide adversarial training towards higher accuracy, i.e., preserving those non-robust but predictive features. More specifically, both adversarial examples and their clean counterparts are forced to be aligned in the feature space by distilling predictive representations from the pre-trained/clean model, while previous works barely utilize predictive features from clean models. Therefore, the adversarial training model is updated towards maximally preserving the accuracy as gaining robustness. A key advantage of our method is that it can be universally adapted to and boost existing works. Exhaustive experiments on various datasets, classification models, and adversarial training algorithms demonstrate the effectiveness of our proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes