CVDec 1, 2020

Robustness Out of the Box: Compositional Representations Naturally Defend Against Black-Box Patch Attacks

Christian Cosgrove, Adam Kortylewski, Chenglin Yang, Alan Yuille

arXiv:2012.00558v15.85 citations

Originality Highly original

AI Analysis

This work provides a novel, more robust, and interpretable defense against black-box patch attacks for computer vision models, particularly beneficial for applications where adversarial training is too expensive or where interpretability is crucial.

This paper investigates defenses against black-box patch attacks, finding that adversarial training is ineffective. Instead, compositional deep networks, which inherently possess part-based representations, demonstrate robustness to these attacks on PASCAL3D+ and GTSRB, outperforming adversarially trained standard models by a large margin.

Patch-based adversarial attacks introduce a perceptible but localized change to the input that induces misclassification. While progress has been made in defending against imperceptible attacks, it remains unclear how patch-based attacks can be resisted. In this work, we study two different approaches for defending against black-box patch attacks. First, we show that adversarial training, which is successful against imperceptible attacks, has limited effectiveness against state-of-the-art location-optimized patch attacks. Second, we find that compositional deep networks, which have part-based representations that lead to innate robustness to natural occlusion, are robust to patch attacks on PASCAL3D+ and the German Traffic Sign Recognition Benchmark, without adversarial training. Moreover, the robustness of compositional models outperforms that of adversarially trained standard models by a large margin. However, on GTSRB, we observe that they have problems discriminating between similar traffic signs with fine-grained differences. We overcome this limitation by introducing part-based finetuning, which improves fine-grained recognition. By leveraging compositional representations, this is the first work that defends against black-box patch attacks without expensive adversarial training. This defense is more robust than adversarial training and more interpretable because it can locate and ignore adversarial patches.

View on arXiv PDF

Similar