CVAILGMar 26, 2022

Reverse Engineering of Imperceptible Adversarial Image Perturbations

arXiv:2203.14145v225 citationsh-index: 50
Originality Incremental advance
AI Analysis

This addresses a novel problem in adversarial machine learning by enabling recovery from attacks, which could enhance security for image classification systems, though it is incremental in building on existing denoising and attack methods.

The paper tackles the problem of reverse-engineering adversarial perturbations from attacked images, proposing a new paradigm called Reverse Engineering of Deceptions (RED) to estimate perturbations and recover original images. It introduces a Class-Discriminative Denoising framework (CDD-RED) that achieves effectiveness across multiple metrics and attack methods, as demonstrated in extensive experiments.

It has been well recognized that neural network based image classifiers are easily fooled by images with tiny perturbations crafted by an adversary. There has been a vast volume of research to generate and defend such adversarial attacks. However, the following problem is left unexplored: How to reverse-engineer adversarial perturbations from an adversarial image? This leads to a new adversarial learning paradigm--Reverse Engineering of Deceptions (RED). If successful, RED allows us to estimate adversarial perturbations and recover the original images. However, carefully crafted, tiny adversarial perturbations are difficult to recover by optimizing a unilateral RED objective. For example, the pure image denoising method may overfit to minimizing the reconstruction error but hardly preserve the classification properties of the true adversarial perturbations. To tackle this challenge, we formalize the RED problem and identify a set of principles crucial to the RED approach design. Particularly, we find that prediction alignment and proper data augmentation (in terms of spatial transformations) are two criteria to achieve a generalizable RED approach. By integrating these RED principles with image denoising, we propose a new Class-Discriminative Denoising based RED framework, termed CDD-RED. Extensive experiments demonstrate the effectiveness of CDD-RED under different evaluation metrics (ranging from the pixel-level, prediction-level to the attribution-level alignment) and a variety of attack generation methods (e.g., FGSM, PGD, CW, AutoAttack, and adaptive attacks).

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes