LGCVMLSep 3, 2018

Adversarial Attack Type I: Cheat Classifiers by Significant Changes

arXiv:1809.00594v218 citations
Originality Incremental advance
AI Analysis

This addresses a security vulnerability in AI systems by revealing a distinct type of adversarial attack, which is incremental as it builds on known attack types but focuses on a different error type.

The paper tackles the problem of adversarial attacks by introducing a new type that cheats classifiers through significant changes, such as altering a face while still being recognized as the same person, and shows it is practical and effective on large-scale image datasets, with most examples evading detectors for existing attacks.

Despite the great success of deep neural networks, the adversarial attack can cheat some well-trained classifiers by small permutations. In this paper, we propose another type of adversarial attack that can cheat classifiers by significant changes. For example, we can significantly change a face but well-trained neural networks still recognize the adversarial and the original example as the same person. Statistically, the existing adversarial attack increases Type II error and the proposed one aims at Type I error, which are hence named as Type II and Type I adversarial attack, respectively. The two types of attack are equally important but are essentially different, which are intuitively explained and numerically evaluated. To implement the proposed attack, a supervised variation autoencoder is designed and then the classifier is attacked by updating the latent variables using gradient information. {Besides, with pre-trained generative models, Type I attack on latent spaces is investigated as well.} Experimental results show that our method is practical and effective to generate Type I adversarial examples on large-scale image datasets. Most of these generated examples can pass detectors designed for defending Type II attack and the strengthening strategy is only efficient with a specific type attack, both implying that the underlying reasons for Type I and Type II attack are different.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes