CVJul 4, 2023

Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection

arXiv:2307.02500v21 citationsh-index: 1

AI Analysis

This addresses the problem of deploying interpretable and robust AI models in real-world applications, though it is incremental in linking robustness and interpretability.

The study tackled the challenge of maintaining interpretability in complex deep neural networks by evaluating how adversarial training, which enhances robustness against attacks, also improves interpretability in computer vision models. It found that robust models focus on more meaningful image regions and features closer to real ones compared to standard models.

With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized to produce robust models - less vulnerable to adversarial attacks. It has been shown to make computer vision models more interpretable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove the correlation between these two problems, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans. Conversely, these models focus on distinctive regions of the images which support their predictions. Moreover, the features learned by the robust model are closer to the real ones.

View on arXiv PDF

Similar