LGCRCVNov 9, 2022

On the Robustness of Explanations of Deep Neural Network Models: A Survey

arXiv:2211.04780v110 citationsh-index: 37
Originality Synthesis-oriented
AI Analysis

This is an incremental work that synthesizes existing research on explanation robustness for stakeholders in risk-sensitive and safety-critical domains.

This survey addresses the problem of robustness in explanations for deep neural network models by compiling and reviewing methods, metrics, attacks, and defenses, concluding with community guidelines for ensuring reliable explanations.

Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes