Evaluating and Understanding the Robustness of Adversarial Logit Pairing
This work identifies a critical vulnerability in a proposed defense for adversarial attacks, which is important for researchers and practitioners in machine learning security.
The paper evaluated the robustness of Adversarial Logit Pairing, a defense against adversarial examples, and found it achieved only 0.6% accuracy under the threat model it was designed for.
We evaluate the robustness of Adversarial Logit Pairing, a recently proposed defense against adversarial examples. We find that a network trained with Adversarial Logit Pairing achieves 0.6% accuracy in the threat model in which the defense is considered. We provide a brief overview of the defense and the threat models/claims considered, as well as a discussion of the methodology and results of our attack, which may offer insights into the reasons underlying the vulnerability of ALP to adversarial attack.