PuVAE: A Variational Autoencoder to Purify Adversarial Examples
This addresses the problem of adversarial attacks for deep learning systems, offering a fast and robust defense method, though it appears incremental as it builds on existing purification approaches.
The paper tackles the vulnerability of deep neural networks to adversarial attacks by proposing PuVAE, a variational autoencoder that purifies adversarial examples by projecting them onto class manifolds, achieving competitive performance with state-of-the-art defense methods and being approximately 130 times faster than Defense-GAN in inference time.
Deep neural networks are widely used and exhibit excellent performance in many areas. However, they are vulnerable to adversarial attacks that compromise the network at the inference time by applying elaborately designed perturbation to input data. Although several defense methods have been proposed to address specific attacks, other attack methods can circumvent these defense mechanisms. Therefore, we propose Purifying Variational Autoencoder (PuVAE), a method to purify adversarial examples. The proposed method eliminates an adversarial perturbation by projecting an adversarial example on the manifold of each class, and determines the closest projection as a purified sample. We experimentally illustrate the robustness of PuVAE against various attack methods without any prior knowledge. In our experiments, the proposed method exhibits performances competitive with state-of-the-art defense methods, and the inference time is approximately 130 times faster than that of Defense-GAN that is the state-of-the art purifier model.