LGJan 29, 2018

Certified Defenses against Adversarial Examples

arXiv:1801.09344v21014 citations
Originality Highly original
AI Analysis

This addresses the security vulnerability of neural networks to adversarial attacks, offering a foundational step towards ending the arms race in defenses, though it is limited to one-hidden-layer networks.

The paper tackles the problem of adversarial examples in neural networks by proposing a method that provides certified defenses, ensuring no attack can exceed a certain error threshold, achieving at most 35% test error on MNIST with perturbations up to ε=0.1 per pixel.

While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses. Can we somehow end this arms race? In this work, we study this problem for neural networks with one hidden layer. We first propose a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value. Second, as this certificate is differentiable, we jointly optimize it with the network parameters, providing an adaptive regularizer that encourages robustness against all attacks. On MNIST, our approach produces a network and a certificate that no attack that perturbs each pixel by at most ε= 0.1 can cause more than 35% test error.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes