LG MLJun 24, 2021

On the (Un-)Avoidability of Adversarial Examples

arXiv:2106.13326v13.11 citations

Originality Incremental advance

AI Analysis

This work addresses the reliability of deep learning models against adversarial attacks, offering a theoretical foundation for adaptive robustness, but it is incremental as it builds on existing research on adversarial examples.

The paper tackles the problem of adversarial examples in deep neural networks by proposing a framework to determine when label changes under small perturbations are justified, defining adversarial robustness as a locally adaptive measure, and developing an adaptive data-augmentation method that maintains consistency for 1-nearest neighbor classification under deterministic labels.

The phenomenon of adversarial examples in deep learning models has caused substantial concern over their reliability. While many deep neural networks have shown impressive performance in terms of predictive accuracy, it has been shown that in many instances an imperceptible perturbation can falsely flip the network's prediction. Most research has then focused on developing defenses against adversarial attacks or learning under a worst-case adversarial loss. In this work, we take a step back and aim to provide a framework for determining whether a model's label change under small perturbation is justified (and when it is not). We carefully argue that adversarial robustness should be defined as a locally adaptive measure complying with the underlying distribution. We then suggest a definition for an adaptive robust loss, derive an empirical version of it, and develop a resulting data-augmentation framework. We prove that our adaptive data-augmentation maintains consistency of 1-nearest neighbor classification under deterministic labels and provide illustrative empirical evaluations.

View on arXiv PDF

Similar