LGJul 20, 2021

Shared Interest: Measuring Human-AI Alignment to Identify Recurring Patterns in Model Behavior

arXiv:2107.09234v264 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of tedious and ad hoc manual inspection in model interpretability for researchers and practitioners, offering a tool to improve trustworthiness and uncover issues in AI systems.

The paper tackles the challenge of interpreting saliency methods for neural networks by introducing Shared Interest, a set of metrics that quantitatively compare model reasoning to human reasoning, enabling systematic analysis and identification of eight recurring patterns in model behavior.

Saliency methods -- techniques to identify the importance of input features on a model's output -- are a common step in understanding neural network behavior. However, interpreting saliency requires tedious manual inspection to identify and aggregate patterns in model behavior, resulting in ad hoc or cherry-picked analysis. To address these concerns, we present Shared Interest: metrics for comparing model reasoning (via saliency) to human reasoning (via ground truth annotations). By providing quantitative descriptors, Shared Interest enables ranking, sorting, and aggregating inputs, thereby facilitating large-scale systematic analysis of model behavior. We use Shared Interest to identify eight recurring patterns in model behavior, such as cases where contextual features or a subset of ground truth features are most important to the model. Working with representative real-world users, we show how Shared Interest can be used to decide if a model is trustworthy, uncover issues missed in manual analyses, and enable interactive probing.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes