CLOct 13, 2022

Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods

arXiv:2210.07222v3230 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses the challenge of explaining neural model predictions to laypeople, offering incremental improvements in comprehensibility over existing methods.

The paper tackled the problem of making saliency maps more interpretable by translating them into natural language, comparing novel methods like instruction-based and search-based verbalizations against conventional representations. The instruction-based method using GPT-3.5 achieved the highest human ratings for helpfulness and ease of understanding, but lacked faithfulness, while the search-based method was faithful but less helpful.

Saliency maps can explain a neural model's predictions by identifying important input features. They are difficult to interpret for laypeople, especially for instances with many features. In order to make them more accessible, we formalize the underexplored task of translating saliency maps into natural language and compare methods that address two key challenges of this approach -- what and how to verbalize. In both automatic and human evaluation setups, using token-level attributions from text classification tasks, we compare two novel methods (search-based and instruction-based verbalizations) against conventional feature importance representations (heatmap visualizations and extractive rationales), measuring simulatability, faithfulness, helpfulness and ease of understanding. Instructing GPT-3.5 to generate saliency map verbalizations yields plausible explanations which include associations, abstractive summarization and commonsense reasoning, achieving by far the highest human ratings, but they are not faithfully capturing numeric information and are inconsistent in their interpretation of the task. In comparison, our search-based, model-free verbalization approach efficiently completes templated verbalizations, is faithful by design, but falls short in helpfulness and simulatability. Our results suggest that saliency map verbalization makes feature attribution explanations more comprehensible and less cognitively challenging to humans than conventional representations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes