CLAIOct 9, 2023

Dynamic Top-k Estimation Consolidates Disagreement between Feature Attribution Methods

arXiv:2310.05619v2134 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the challenge of selecting meaningful token highlights for human interpretation in NLP explainability, offering an incremental improvement over static methods.

The paper tackles the problem of determining the optimal number of tokens to highlight in feature attribution for text classifiers by proposing a dynamic k approach based on sequential properties of scores, finding that it improves agreement for methods like Integrated Gradient and GradientXInput while reducing advantages of others.

Feature attribution scores are used for explaining the prediction of a text classifier to users by highlighting a k number of tokens. In this work, we propose a way to determine the number of optimal k tokens that should be displayed from sequential properties of the attribution scores. Our approach is dynamic across sentences, method-agnostic, and deals with sentence length bias. We compare agreement between multiple methods and humans on an NLI task, using fixed k and dynamic k. We find that perturbation-based methods and Vanilla Gradient exhibit highest agreement on most method--method and method--human agreement metrics with a static k. Their advantage over other methods disappears with dynamic ks which mainly improve Integrated Gradient and GradientXInput. To our knowledge, this is the first evidence that sequential properties of attribution scores are informative for consolidating attribution signals for human interpretation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes