CLOct 20, 2021

Interpreting Deep Learning Models in Natural Language Processing: A Review

arXiv:2110.10470v253 citations
Originality Synthesis-oriented
AI Analysis

It tackles the problem of interpretability in neural NLP models for researchers and practitioners, but is incremental as it synthesizes existing work without introducing new methods.

This survey reviews interpretation methods for deep learning models in natural language processing (NLP), addressing the lack of interpretability that limits reliability and applications in critical areas like healthcare, and categorizes methods into training-based, test-based, and hybrid approaches while noting current deficiencies and future research directions.

Neural network models have achieved state-of-the-art performances in a wide range of natural language processing (NLP) tasks. However, a long-standing criticism against neural network models is the lack of interpretability, which not only reduces the reliability of neural NLP systems but also limits the scope of their applications in areas where interpretability is essential (e.g., health care applications). In response, the increasing interest in interpreting neural NLP models has spurred a diverse array of interpretation methods over recent years. In this survey, we provide a comprehensive review of various interpretation methods for neural models in NLP. We first stretch out a high-level taxonomy for interpretation methods in NLP, i.e., training-based approaches, test-based approaches, and hybrid approaches. Next, we describe sub-categories in each category in detail, e.g., influence-function based methods, KNN-based methods, attention-based models, saliency-based methods, perturbation-based methods, etc. We point out deficiencies of current methods and suggest some avenues for future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes