CV AIMar 24, 2024

Adversarial Guided Diffusion Models for Adversarial Purification

Guang Lin, Zerui Tao, Jianhai Zhang, Toshihisa Tanaka, Qibin Zhao

arXiv:2403.16067v512.812 citationsh-index: 6Neural Networks

Originality Incremental advance

AI Analysis

This work addresses adversarial robustness in machine learning models, offering an incremental improvement over existing diffusion-based purification methods.

The paper tackles the problem of adversarial purification in diffusion models, where existing methods may preserve adversarial perturbations or compromise semantic information, and proposes an adversarial guided diffusion model (AGDM) that improves robust accuracy by up to 7.30% on CIFAR-10.

Diffusion model (DM) based adversarial purification (AP) has proven to be a powerful defense method that can remove adversarial perturbations and generate a purified example without threats. In principle, the pre-trained DMs can only ensure that purified examples conform to the same distribution of the training data, but it may inadvertently compromise the semantic information of input examples, leading to misclassification of purified examples. Recent advancements introduce guided diffusion techniques to preserve semantic information while removing the perturbations. However, these guidances often rely on distance measures between purified examples and diffused examples, which can also preserve perturbations in purified examples. To further unleash the robustness power of DM-based AP, we propose an adversarial guided diffusion model (AGDM) by introducing a novel adversarial guidance that contains sufficient semantic information but does not explicitly involve adversarial perturbations. The guidance is modeled by an auxiliary neural network obtained with adversarial training, considering the distance in the latent representations rather than at the pixel-level values. Extensive experiments are conducted on CIFAR-10, CIFAR-100 and ImageNet to demonstrate that our method is effective for simultaneously maintaining semantic information and removing the adversarial perturbations. In addition, comprehensive comparisons show that our method significantly enhances the robustness of existing DM-based AP, with an average robust accuracy improved by up to 7.30% on CIFAR-10.

View on arXiv PDF

Similar