LGJul 6, 2017

A causal framework for explaining the predictions of black-box sequence-to-sequence models

arXiv:1707.01943v31209 citations
Originality Incremental advance
AI Analysis

This provides explanations for black-box models in NLP, which is incremental as it builds on existing perturbation-based interpretability techniques.

The paper tackles the problem of interpreting black-box sequence-to-sequence models by developing a method that identifies causally related input-output token groups through perturbations and graph partitioning, tested across NLP tasks.

We interpret the predictions of any black-box structured input-structured output model around a specific input-output pair. Our method returns an "explanation" consisting of groups of input-output tokens that are causally related. These dependencies are inferred by querying the black-box model with perturbed inputs, generating a graph over tokens from the responses, and solving a partitioning problem to select the most relevant components. We focus the general approach on sequence-to-sequence problems, adopting a variational autoencoder to yield meaningful input perturbations. We test our method across several NLP sequence generation tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes