CVDec 4, 2024

Pre-trained Multiple Latent Variable Generative Models are good defenders against Adversarial Attacks

arXiv:2412.03453v11 citationsh-index: 45Has CodeWACV
Originality Incremental advance
AI Analysis

This work addresses the problem of adversarial attacks for machine learning practitioners by offering a training-free defense method, though it is incremental as it builds on existing adversarial purification techniques.

The authors tackled adversarial attacks on classifiers by using pre-trained Multiple Latent Variable Generative Models (MLVGMs) for adversarial purification, which autoencodes images to remove noise while preserving class information, achieving competitive results with smaller models compared to traditional methods.

Attackers can deliberately perturb classifiers' input with subtle noise, altering final predictions. Among proposed countermeasures, adversarial purification employs generative networks to preprocess input images, filtering out adversarial noise. In this study, we propose specific generators, defined Multiple Latent Variable Generative Models (MLVGMs), for adversarial purification. These models possess multiple latent variables that naturally disentangle coarse from fine features. Taking advantage of these properties, we autoencode images to maintain class-relevant information, while discarding and re-sampling any detail, including adversarial noise. The procedure is completely training-free, exploring the generalization abilities of pre-trained MLVGMs on the adversarial purification downstream task. Despite the lack of large models, trained on billions of samples, we show that smaller MLVGMs are already competitive with traditional methods, and can be used as foundation models. Official code released at https://github.com/SerezD/gen_adversarial.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes