Attacks Meet Interpretability (AmI) Evaluation and Findings
For researchers evaluating adversarial defense methods, this work highlights the sensitivity of AmI to hyperparameters and offers guidance for robust evaluation.
The paper reproduces and evaluates the AmI adversarial example detection method, finding it highly dependent on hyperparameter selection but still effective against Carlini's attack with proper tuning. Recommendations for evaluating defense techniques are provided.
To investigate the effectiveness of the model explanation in detecting adversarial examples, we reproduce the results of two papers, Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples and Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples. And then conduct experiments and case studies to identify the limitations of both works. We find that Attacks Meet Interpretability(AmI) is highly dependent on the selection of hyperparameters. Therefore, with a different hyperparameter choice, AmI is still able to detect Nicholas Carlini's attack. Finally, we propose recommendations for future work on the evaluation of defense techniques such as AmI.