CV CR LG IVNov 26, 2023

Adversarial Purification of Information Masking

Sitong Liu, Zhichao Lian, Shuangquan Zhang, Liang Xiao

arXiv:2311.15339v11.51 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This addresses the security vulnerability of neural networks to adversarial attacks, which is a critical issue for AI safety and robustness, though it appears incremental as it builds on existing purification methods.

The paper tackles the problem of adversarial attacks on neural networks by proposing a novel adversarial purification method called Information Mask Purification (IMPure), which achieves state-of-the-art results against nine adversarial attack methods on the ImageNet dataset with three classifier models.

Adversarial attacks meticulously generate minuscule, imperceptible perturbations to images to deceive neural networks. Counteracting these, adversarial purification methods seek to transform adversarial input samples into clean output images to defend against adversarial attacks. Nonetheless, extent generative models fail to effectively eliminate adversarial perturbations, yielding less-than-ideal purification results. We emphasize the potential threat of residual adversarial perturbations to target models, quantitatively establishing a relationship between perturbation scale and attack capability. Notably, the residual perturbations on the purified image primarily stem from the same-position patch and similar patches of the adversarial sample. We propose a novel adversarial purification approach named Information Mask Purification (IMPure), aims to extensively eliminate adversarial perturbations. To obtain an adversarial sample, we first mask part of the patches information, then reconstruct the patches to resist adversarial perturbations from the patches. We reconstruct all patches in parallel to obtain a cohesive image. Then, in order to protect the purified samples against potential similar regional perturbations, we simulate this risk by randomly mixing the purified samples with the input samples before inputting them into the feature extraction network. Finally, we establish a combined constraint of pixel loss and perceptual loss to augment the model's reconstruction adaptability. Extensive experiments on the ImageNet dataset with three classifier models demonstrate that our approach achieves state-of-the-art results against nine adversarial attack methods. Implementation code and pre-trained weights can be accessed at \textcolor{blue}{https://github.com/NoWindButRain/IMPure}.

View on arXiv PDF Code

Similar