LG AI CVMar 12, 2021

A Unified Game-Theoretic Interpretation of Adversarial Robustness

Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Yiting Chen, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi, Quanshi Zhang

arXiv:2103.07364v211.310 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a principled explanation for adversarial robustness, potentially unifying attacks and defenses, but it is incremental as it builds on existing game-theoretic interpretations without introducing new methods.

The paper tackles the problem of explaining adversarial attacks and defenses in deep neural networks by proposing a unified view based on multi-order interactions between input variables, finding that attacks affect high-order interactions while robustness in adversarially trained models stems from category-specific low-order interactions.

This paper provides a unified view to explain different adversarial attacks and defense methods, i.e. the view of multi-order interactions between input variables of DNNs. Based on the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing defense methods in a principle way. Besides, our findings also make a revision of previous inaccurate understanding of the shape bias of adversarially learned features.

View on arXiv PDF Code

Similar