LGJul 20, 2021

Shared Interest: Measuring Human-AI Alignment to Identify Recurring Patterns in Model Behavior

Angie Boggust, Benjamin Hoover, Arvind Satyanarayan, Hendrik Strobelt

arXiv:2107.09234v215.564 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of tedious and ad hoc manual inspection in model interpretability for researchers and practitioners, offering a tool to improve trustworthiness and uncover issues in AI systems.

The paper tackles the challenge of interpreting saliency methods for neural networks by introducing Shared Interest, a set of metrics that quantitatively compare model reasoning to human reasoning, enabling systematic analysis and identification of eight recurring patterns in model behavior.

Saliency methods -- techniques to identify the importance of input features on a model's output -- are a common step in understanding neural network behavior. However, interpreting saliency requires tedious manual inspection to identify and aggregate patterns in model behavior, resulting in ad hoc or cherry-picked analysis. To address these concerns, we present Shared Interest: metrics for comparing model reasoning (via saliency) to human reasoning (via ground truth annotations). By providing quantitative descriptors, Shared Interest enables ranking, sorting, and aggregating inputs, thereby facilitating large-scale systematic analysis of model behavior. We use Shared Interest to identify eight recurring patterns in model behavior, such as cases where contextual features or a subset of ground truth features are most important to the model. Working with representative real-world users, we show how Shared Interest can be used to decide if a model is trustworthy, uncover issues missed in manual analyses, and enable interactive probing.

View on arXiv PDF Code

Similar