LGCVIVApr 26, 2020

Towards Feature Space Adversarial Attack

arXiv:2004.12385v227 citations
AI Analysis

This addresses security vulnerabilities in image classification systems by exposing limitations of existing pixel-space defenses against style-based attacks.

The paper tackles adversarial attacks on deep neural networks for image classification by proposing a feature-space attack that perturbs style features instead of input pixels, generating more natural-looking adversarial samples than state-of-the-art unbounded attacks.

We propose a new adversarial attack to Deep Neural Networks for image classification. Different from most existing attacks that directly perturb input pixels, our attack focuses on perturbing abstract features, more specifically, features that denote styles, including interpretable styles such as vivid colors and sharp outlines, and uninterpretable ones. It induces model misclassfication by injecting imperceptible style changes through an optimization procedure. We show that our attack can generate adversarial samples that are more natural-looking than the state-of-the-art unbounded attacks. The experiment also supports that existing pixel-space adversarial attack detection and defense techniques can hardly ensure robustness in the style related feature space.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes