LGAICVMar 12, 2021

A Unified Game-Theoretic Interpretation of Adversarial Robustness

arXiv:2103.07364v211.310 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides a principled explanation for adversarial robustness, potentially unifying attacks and defenses, but it is incremental as it builds on existing game-theoretic interpretations without introducing new methods.

The paper tackles the problem of explaining adversarial attacks and defenses in deep neural networks by proposing a unified view based on multi-order interactions between input variables, finding that attacks affect high-order interactions while robustness in adversarially trained models stems from category-specific low-order interactions.

This paper provides a unified view to explain different adversarial attacks and defense methods, i.e. the view of multi-order interactions between input variables of DNNs. Based on the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing defense methods in a principle way. Besides, our findings also make a revision of previous inaccurate understanding of the shape bias of adversarially learned features.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes