CVIVFeb 28, 2020

Detecting Patch Adversarial Attacks with Image Residuals

arXiv:2002.12504v20.007 citations
AI Analysis50

This addresses security vulnerabilities in machine learning systems against localized adversarial attacks, but it is incremental as it builds on existing detection techniques.

The paper tackles the problem of detecting patch-based adversarial attacks on images by using image residuals from wavelet denoising as a fingerprint, and it shows that the method reduces attack success rates and increases computational effort for adaptive attackers.

We introduce an adversarial sample detection algorithm based on image residuals, specifically designed to guard against patch-based attacks. The image residual is obtained as the difference between an input image and a denoised version of it, and a discriminator is trained to distinguish between clean and adversarial samples. More precisely, we use a wavelet domain algorithm for denoising images and demonstrate that the obtained residuals act as a digital fingerprint for adversarial attacks. To emulate the limitations of a physical adversary, we evaluate the performance of our approach against localized (patch-based) adversarial attacks, including in settings where the adversary has complete knowledge about the detection scheme. Our results show that the proposed detection method generalizes to previously unseen, stronger attacks and that it is able to reduce the success rate (conversely, increase the computational effort) of an adaptive attacker.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes