CVCRLGMay 13, 2020

Adversarial examples are useful too!

arXiv:2005.06107v11 citations
Originality Incremental advance
AI Analysis

This addresses a security issue for AI practitioners by providing a detection method for training-phase attacks, though it is incremental as it builds on existing adversarial example techniques.

The paper tackles the problem of detecting backdoor attacks in deep learning models by using adversarial examples to compute and compare statistical mean maps, enabling visual identification of perturbed regions.

Deep learning has come a long way and has enjoyed an unprecedented success. Despite high accuracy, however, deep models are brittle and are easily fooled by imperceptible adversarial perturbations. In contrast to common inference-time attacks, Backdoor (\aka Trojan) attacks target the training phase of model construction, and are extremely difficult to combat since a) the model behaves normally on a pristine testing set and b) the augmented perturbations can be minute and may only affect few training samples. Here, I propose a new method to tell whether a model has been subject to a backdoor attack. The idea is to generate adversarial examples, targeted or untargeted, using conventional attacks such as FGSM and then feed them back to the classifier. By computing the statistics (here simply mean maps) of the images in different categories and comparing them with the statistics of a reference model, it is possible to visually locate the perturbed regions and unveil the attack.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes