Game-Theoretic Understanding of Misclassification
This work provides insights into model malfunctions for researchers in adversarial robustness and model interpretability, but it is incremental as it broadens existing game-theoretic methods to new contexts.
The paper tackled the problem of understanding image misclassification in deep learning models by analyzing clean, adversarial, and corrupted images using game-theoretic interactions, finding that adversarial images have higher high-order interactions and corrupted images have lower low-order interactions compared to clean images, and extended this analysis to Vision Transformers, revealing different interaction patterns than CNNs.
This paper analyzes various types of image misclassification from a game-theoretic view. Particularly, we consider the misclassification of clean, adversarial, and corrupted images and characterize it through the distribution of multi-order interactions. We discover that the distribution of multi-order interactions varies across the types of misclassification. For example, misclassified adversarial images have a higher strength of high-order interactions than correctly classified clean images, which indicates that adversarial perturbations create spurious features that arise from complex cooperation between pixels. By contrast, misclassified corrupted images have a lower strength of low-order interactions than correctly classified clean images, which indicates that corruptions break the local cooperation between pixels. We also provide the first analysis of Vision Transformers using interactions. We found that Vision Transformers show a different tendency in the distribution of interactions from that in CNNs, and this implies that they exploit the features that CNNs do not use for the prediction. Our study demonstrates that the recent game-theoretic analysis of deep learning models can be broadened to analyze various malfunctions of deep learning models including Vision Transformers by using the distribution, order, and sign of interactions.