Fine-grained Synthesis of Unrestricted Adversarial Examples
This addresses the vulnerability of computer vision models to adversarial attacks in real-world scenarios where traditional norm-bounded constraints don't apply.
The paper tackles the problem of generating unrestricted adversarial examples by learning fine-grained stylistic and stochastic modifications using generative models, resulting in attacks that bypass certified defenses while maintaining natural appearance and improving model performance on clean images through adversarial training.
We propose a novel approach for generating unrestricted adversarial examples by manipulating fine-grained aspects of image generation. Unlike existing unrestricted attacks that typically hand-craft geometric transformations, we learn stylistic and stochastic modifications leveraging state-of-the-art generative models. This allows us to manipulate an image in a controlled, fine-grained manner without being bounded by a norm threshold. Our approach can be used for targeted and non-targeted unrestricted attacks on classification, semantic segmentation and object detection models. Our attacks can bypass certified defenses, yet our adversarial images look indistinguishable from natural images as verified by human evaluation. Moreover, we demonstrate that adversarial training with our examples improves performance of the model on clean images without requiring any modifications to the architecture. We perform experiments on LSUN, CelebA-HQ and COCO-Stuff as high resolution datasets to validate efficacy of our proposed approach.