LG AI CR CV MLFeb 1, 2019

The Efficacy of SHIELD under Different Threat Models

Cory Cornelius, Nilaksh Das, Shang-Tse Chen, Li Chen, Michael E. Kounavis, Duen Horng Chau

arXiv:1902.00541v27.111 citations

Originality Synthesis-oriented

AI Analysis

It addresses the robustness of adversarial defenses for image classification models, but is incremental as it builds on existing work.

This paper evaluates the SHIELD defense framework against adversarial attacks under alternative threat models, finding that training the ensemble from scratch reduces the targeted attack success rate from 64.3% to 48.9% in a full white-box scenario.

In this appraisal paper, we evaluate the efficacy of SHIELD, a compression-based defense framework for countering adversarial attacks on image classification models, which was published at KDD 2018. Here, we consider alternative threat models not studied in the original work, where we assume that an adaptive adversary is aware of the ensemble defense approach, the defensive pre-processing, and the architecture and weights of the models used in the ensemble. We define scenarios with varying levels of threat and empirically analyze the proposed defense by varying the degree of information available to the attacker, spanning from a full white-box attack to the gray-box threat model described in the original work. To evaluate the robustness of the defense against an adaptive attacker, we consider the targeted-attack success rate of the Projected Gradient Descent (PGD) attack, which is a strong gradient-based adversarial attack proposed in adversarial machine learning research. We also experiment with training the SHIELD ensemble from scratch, which is different from re-training using a pre-trained model as done in the original work. We find that the targeted PGD attack has a success rate of 64.3% against the original SHIELD ensemble in the full white box scenario, but this drops to 48.9% if the models used in the ensemble are trained from scratch instead of being retrained. Our experiments further reveal that an ensemble whose models are re-trained indeed have higher correlation in the cosine similarity space, and models that are trained from scratch are less vulnerable to targeted attacks in the white-box and gray-box scenarios.

View on arXiv PDF

Similar