LG MLMay 30, 2018

Robustifying Models Against Adversarial Attacks by Langevin Dynamics

Vignesh Srinivasan, Arturo Marban, Klaus-Robert Müller, Wojciech Samek, Shinichi Nakajima

arXiv:1805.12017v211.717 citations

Originality Incremental advance

AI Analysis

This addresses the critical issue of model robustness for security-sensitive applications, though it appears incremental as it builds on projection methods like Defense-GAN.

The paper tackles the problem of adversarial attacks on deep learning models by proposing a defense strategy that relaxes adversarial samples onto the target class manifold using Langevin dynamics, achieving state-of-the-art performance against various attacks.

Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a lot of defense methods were proposed, which however, have been circumvented by newer attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains unsolved. This paper proposes a novel, simple yet effective defense strategy where adversarial samples are relaxed onto the underlying manifold of the (unknown) target class distribution. Specifically, our algorithm drives off-manifold adversarial samples towards high density regions of the data generating distribution of the target class by the Metroplis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. Although the motivation is similar to projection methods, e.g., Defense-GAN, our algorithm, called MALA for DEfense (MALADE), is equipped with significant dispersion - projection is distributed broadly, and therefore any whitebox attack cannot accurately align the input so that the MALADE moves it to a targeted untrained spot where the model predicts a wrong label. In our experiments, MALADE exhibited state-of-the-art performance against various elaborate attacking strategies.

View on arXiv PDF

Similar