LGCRMLJan 6, 2020

Generating Semantic Adversarial Examples via Feature Manipulation

arXiv:2001.02297v27 citations
AI Analysis

This addresses the problem of adversarial robustness in AI systems, particularly for image classifiers, by introducing a more practical and interpretable attack method, though it is incremental as it builds on existing latent space perturbation approaches.

The paper tackles the vulnerability of deep neural networks to adversarial attacks by proposing a method that generates semantic adversarial examples via feature manipulation, achieving effective attacks on black-box classifiers and demonstrating the existence of universal semantic adversarial examples.

The vulnerability of deep neural networks to adversarial attacks has been widely demonstrated (e.g., adversarial example attacks). Traditional attacks perform unstructured pixel-wise perturbation to fool the classifier. An alternative approach is to have perturbations in the latent space. However, such perturbations are hard to control due to the lack of interpretability and disentanglement. In this paper, we propose a more practical adversarial attack by designing structured perturbation with semantic meanings. Our proposed technique manipulates the semantic attributes of images via the disentangled latent codes. The intuition behind our technique is that images in similar domains have some commonly shared but theme-independent semantic attributes, e.g. thickness of lines in handwritten digits, that can be bidirectionally mapped to disentangled latent codes. We generate adversarial perturbation by manipulating a single or a combination of these latent codes and propose two unsupervised semantic manipulation approaches: vector-based disentangled representation and feature map-based disentangled representation, in terms of the complexity of the latent codes and smoothness of the reconstructed images. We conduct extensive experimental evaluations on real-world image data to demonstrate the power of our attacks for black-box classifiers. We further demonstrate the existence of a universal, image-agnostic semantic adversarial example.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes