LGCVJan 12, 2022

Adversarially Robust Classification by Conditional Generative Model Inversion

arXiv:2201.04733v1
Originality Highly original
AI Analysis

This addresses the vulnerability of machine learning models to adversarial attacks, offering a novel defense approach that is robust by construction, though it may be incremental relative to prior generative defenses like Defense-GAN.

The paper tackles the problem of adversarial attacks on classifiers by proposing a method that uses conditional generative model inversion for classification, avoiding gradient obfuscation and achieving robustness without prior attack knowledge. It demonstrates extreme robustness against black-box attacks and improved robustness against white-box attacks compared to standard classifiers.

Most adversarial attack defense methods rely on obfuscating gradients. These methods are successful in defending against gradient-based attacks; however, they are easily circumvented by attacks which either do not use the gradient or by attacks which approximate and use the corrected gradient. Defenses that do not obfuscate gradients such as adversarial training exist, but these approaches generally make assumptions about the attack such as its magnitude. We propose a classification model that does not obfuscate gradients and is robust by construction without assuming prior knowledge about the attack. Our method casts classification as an optimization problem where we "invert" a conditional generator trained on unperturbed, natural images to find the class that generates the closest sample to the query image. We hypothesize that a potential source of brittleness against adversarial attacks is the high-to-low-dimensional nature of feed-forward classifiers which allows an adversary to find small perturbations in the input space that lead to large changes in the output space. On the other hand, a generative model is typically a low-to-high-dimensional mapping. While the method is related to Defense-GAN, the use of a conditional generative model and inversion in our model instead of the feed-forward classifier is a critical difference. Unlike Defense-GAN, which was shown to generate obfuscated gradients that are easily circumvented, we show that our method does not obfuscate gradients. We demonstrate that our model is extremely robust against black-box attacks and has improved robustness against white-box attacks compared to naturally trained, feed-forward classifiers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes