Learning Universal Adversarial Perturbations with Generative Models
This work addresses security concerns in machine learning systems by enhancing adversarial attack methods, though it appears incremental as it builds on known universal adversarial perturbations.
The paper tackles the problem of neural networks' vulnerability to adversarial examples by introducing universal adversarial networks, a generative model that produces a single perturbation to fool a target classifier on any input, showing improvement over existing universal adversarial attacks.
Neural networks are known to be vulnerable to adversarial examples, inputs that have been intentionally perturbed to remain visually similar to the source input, but cause a misclassification. It was recently shown that given a dataset and classifier, there exists so called universal adversarial perturbations, a single perturbation that causes a misclassification when applied to any input. In this work, we introduce universal adversarial networks, a generative network that is capable of fooling a target classifier when it's generated output is added to a clean sample from a dataset. We show that this technique improves on known universal adversarial attacks.