Robust Adversarial Learning via Sparsifying Front Ends
This addresses the vulnerability of neural networks to adversarial perturbations, offering a defense mechanism with theoretical guarantees and experimental validation, though it appears incremental in extending linear concepts to deep networks.
The paper tackles the problem of adversarial attacks on deep neural networks by proposing a sparsifying front end defense, which reduces output distortion by a factor of roughly K/N for linear classifiers and shows efficacy on MNIST and CIFAR-10 datasets.
It is by now well-known that small adversarial perturbations can induce classification errors in deep neural networks. In this paper, we take a bottom-up signal processing perspective to this problem and show that a systematic exploitation of sparsity in natural data is a promising tool for defense. For linear classifiers, we show that a sparsifying front end is provably effective against $\ell_{\infty}$-bounded attacks, reducing output distortion due to the attack by a factor of roughly $K/N$ where $N$ is the data dimension and $K$ is the sparsity level. We then extend this concept to deep networks, showing that a "locally linear" model can be used to develop a theoretical foundation for crafting attacks and defenses. We also devise attacks based on the locally linear model that outperform the well-known FGSM attack. We supplement our theoretical results with experiments on the MNIST and CIFAR-10 datasets, showing the efficacy of the proposed sparsity-based defense schemes.