Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation
This work addresses the need for diverse and controllable image generation in low-level vision tasks without computational overhead, though it is incremental as it builds on existing adversarial attack techniques.
The paper tackles the problem of deterministic conditional image generative models producing fixed outputs for the same input, which is unsuitable for subjective tasks like inpainting or style transfer, by proposing a plug-in adversarial attack method that adds micro perturbations to input conditions to generate diverse and controllable results without retraining, achieving high-quality outcomes as demonstrated in experiments.
Existing generative adversarial network (GAN) based conditional image generative models typically produce fixed output for the same conditional input, which is unreasonable for highly subjective tasks, such as large-mask image inpainting or style transfer. On the other hand, GAN-based diverse image generative methods require retraining/fine-tuning the network or designing complex noise injection functions, which is computationally expensive, task-specific, or struggle to generate high-quality results. Given that many deterministic conditional image generative models have been able to produce high-quality yet fixed results, we raise an intriguing question: is it possible for pre-trained deterministic conditional image generative models to generate diverse results without changing network structures or parameters? To answer this question, we re-examine the conditional image generation tasks from the perspective of adversarial attack and propose a simple and efficient plug-in projected gradient descent (PGD) like method for diverse and controllable image generation. The key idea is attacking the pre-trained deterministic generative models by adding a micro perturbation to the input condition. In this way, diverse results can be generated without any adjustment of network structures or fine-tuning of the pre-trained models. In addition, we can also control the diverse results to be generated by specifying the attack direction according to a reference text or image. Our work opens the door to applying adversarial attack to low-level vision tasks, and experiments on various conditional image generation tasks demonstrate the effectiveness and superiority of the proposed method.