CVDec 4, 2024

Pre-trained Multiple Latent Variable Generative Models are good defenders against Adversarial Attacks

Dario Serez, Marco Cristani, Alessio Del Bue, Vittorio Murino, Pietro Morerio

arXiv:2412.03453v13.71 citationsh-index: 45Has CodeWACV

Originality Incremental advance

AI Analysis

This work addresses the problem of adversarial attacks for machine learning practitioners by offering a training-free defense method, though it is incremental as it builds on existing adversarial purification techniques.

The authors tackled adversarial attacks on classifiers by using pre-trained Multiple Latent Variable Generative Models (MLVGMs) for adversarial purification, which autoencodes images to remove noise while preserving class information, achieving competitive results with smaller models compared to traditional methods.

Attackers can deliberately perturb classifiers' input with subtle noise, altering final predictions. Among proposed countermeasures, adversarial purification employs generative networks to preprocess input images, filtering out adversarial noise. In this study, we propose specific generators, defined Multiple Latent Variable Generative Models (MLVGMs), for adversarial purification. These models possess multiple latent variables that naturally disentangle coarse from fine features. Taking advantage of these properties, we autoencode images to maintain class-relevant information, while discarding and re-sampling any detail, including adversarial noise. The procedure is completely training-free, exploring the generalization abilities of pre-trained MLVGMs on the adversarial purification downstream task. Despite the lack of large models, trained on billions of samples, we show that smaller MLVGMs are already competitive with traditional methods, and can be used as foundation models. Official code released at https://github.com/SerezD/gen_adversarial.

View on arXiv PDF Code

Similar