Erratum Concerning the Obfuscated Gradients Attack on Stochastic Activation Pruning
This corrects a misunderstanding in adversarial defense evaluation for machine learning security, though it is incremental as it builds on existing attacks and defenses.
The authors identified a flaw in a prior re-implementation of Stochastic Activation Pruning (SAP) that artificially weakened it, showing that when applied properly, the original attack is ineffective, but they developed a new attack using BPDA that reduces SAP's accuracy to 0.1%.
Stochastic Activation Pruning (SAP) (Dhillon et al., 2018) is a defense to adversarial examples that was attacked and found to be broken by the "Obfuscated Gradients" paper (Athalye et al., 2018). We discover a flaw in the re-implementation that artificially weakens SAP. When SAP is applied properly, the proposed attack is not effective. However, we show that a new use of the BPDA attack technique can still reduce the accuracy of SAP to 0.1%.