LGApr 19, 2021

Removing Adversarial Noise in Class Activation Feature Space

arXiv:2104.09197v137 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of adversarial noise in deep neural networks for security-critical applications, representing an incremental improvement over existing preprocessing defenses.

The paper tackles the vulnerability of deep neural networks to adversarial noise by proposing a self-supervised adversarial training mechanism in class activation feature space, which significantly enhances adversarial robustness compared to previous state-of-the-art approaches, especially against unseen and adaptive attacks.

Deep neural networks (DNNs) are vulnerable to adversarial noise. Preprocessing based defenses could largely remove adversarial noise by processing inputs. However, they are typically affected by the error amplification effect, especially in the front of continuously evolving attacks. To solve this problem, in this paper, we propose to remove adversarial noise by implementing a self-supervised adversarial training mechanism in a class activation feature space. To be specific, we first maximize the disruptions to class activation features of natural examples to craft adversarial examples. Then, we train a denoising model to minimize the distances between the adversarial examples and the natural examples in the class activation feature space. Empirical evaluations demonstrate that our method could significantly enhance adversarial robustness in comparison to previous state-of-the-art approaches, especially against unseen adversarial attacks and adaptive attacks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes