CLLGNEAug 10, 2021

Post-hoc Interpretability for Neural NLP: A Survey

arXiv:2108.04840v5312 citations
AI Analysis

This is an incremental survey addressing the need for accountability and safety in complex NLP models by reviewing existing interpretability techniques.

The paper surveys post-hoc interpretability methods for neural NLP models, categorizing how they communicate explanations to humans and discussing their validation.

Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for accountability. Interpretability serves to provide these explanations in terms that are understandable to humans. Additionally, post-hoc methods provide explanations after a model is learned and are generally model-agnostic. This survey provides a categorization of how recent post-hoc interpretability methods communicate explanations to humans, it discusses each method in-depth, and how they are validated, as the latter is often a common concern.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes