LG CR MLMar 25, 2019

Defending against Whitebox Adversarial Attacks via Randomized Discretization

arXiv:1903.10586v119.880 citations

Originality Incremental advance

AI Analysis

This addresses the vulnerability of image classifiers to whitebox adversarial attacks, offering a computationally efficient defense with broad applicability, though it is incremental as it builds on existing noise and discretization ideas.

The paper tackles the problem of adversarial attacks reducing image classifier accuracy by proposing a defense strategy that injects random Gaussian noise and discretizes pixels, showing theoretically it reduces KL divergence and empirically outperforms adversarially-trained networks and competition defenses on ImageNet against strong PGD attacks.

Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers. In this paper, we propose and analyze a simple and computationally efficient defense strategy: inject random Gaussian noise, discretize each pixel, and then feed the result into any pre-trained classifier. Theoretically, we show that our randomized discretization strategy reduces the KL divergence between original and adversarial inputs, leading to a lower bound on the classification accuracy of any classifier against any (potentially whitebox) $\ell_\infty$-bounded adversarial attack. Empirically, we evaluate our defense on adversarial examples generated by a strong iterative PGD attack. On ImageNet, our defense is more robust than adversarially-trained networks and the winning defenses of the NIPS 2017 Adversarial Attacks & Defenses competition.

View on arXiv PDF

Similar