What Do Adversarially Robust Models Look At?
This addresses the trade-off between accuracy and robustness in machine learning models, providing insights for researchers in adversarial robustness, but it is incremental as it builds on prior observations.
The paper tackles the problem of understanding what features adversarially robust models focus on compared to standard models, finding that robust models look at larger-scale features and pay less attention to fine textures, with a positive effect when used as pre-trained models on low-resolution datasets.
In this paper, we address the open question: "What do adversarially robust models look at?" Recently, it has been reported in many works that there exists the trade-off between standard accuracy and adversarial robustness. According to prior works, this trade-off is rooted in the fact that adversarially robust and standard accurate models might depend on very different sets of features. However, it has not been well studied what kind of difference actually exists. In this paper, we analyze this difference through various experiments visually and quantitatively. Experimental results show that adversarially robust models look at things at a larger scale than standard models and pay less attention to fine textures. Furthermore, although it has been claimed that adversarially robust features are not compatible with standard accuracy, there is even a positive effect by using them as pre-trained models particularly in low resolution datasets.