Featurized Bidirectional GAN: Adversarial Defense via Adversarially Learned Semantic Inference
This addresses the problem of adversarial attacks for machine learning security, but it appears incremental as it builds on existing GAN-based defense approaches.
The paper tackles the vulnerability of deep neural networks to adversarial attacks by proposing FBGAN, a defense method that extracts semantic features to filter perturbations, resulting in reconstructed denoised data that improves classifier robustness.
Deep neural networks have been demonstrated to be vulnerable to adversarial attacks, where small perturbations intentionally added to the original inputs can fool the classifier. In this paper, we propose a defense method, Featurized Bidirectional Generative Adversarial Networks (FBGAN), to extract the semantic features of the input and filter the non-semantic perturbation. FBGAN is pre-trained on the clean dataset in an unsupervised manner, adversarially learning a bidirectional mapping between the high-dimensional data space and the low-dimensional semantic space; also mutual information is applied to disentangle the semantically meaningful features. After the bidirectional mapping, the adversarial data can be reconstructed to denoised data, which could be fed into any pre-trained classifier. We empirically show the quality of reconstruction images and the effectiveness of defense.