LGCRCVJul 24, 2021

Adversarial training may be a double-edged sword

arXiv:2107.11671v1
Originality Incremental advance
AI Analysis

This work addresses the nuanced effectiveness of adversarial training for practitioners in machine learning security, highlighting potential trade-offs in robustness.

The paper investigates adversarial training's impact on neural network robustness, finding that while it significantly improves resistance to white-box attacks, it may not provide comparable robustness gain against more realistic black-box attacks and can even accelerate convergence of minimal perturbation white-box attacks.

Adversarial training has been shown as an effective approach to improve the robustness of image classifiers against white-box attacks. However, its effectiveness against black-box attacks is more nuanced. In this work, we demonstrate that some geometric consequences of adversarial training on the decision boundary of deep networks give an edge to certain types of black-box attacks. In particular, we define a metric called robustness gain to show that while adversarial training is an effective method to dramatically improve the robustness in white-box scenarios, it may not provide such a good robustness gain against the more realistic decision-based black-box attacks. Moreover, we show that even the minimal perturbation white-box attacks can converge faster against adversarially-trained neural networks compared to the regular ones.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes