Improving Interpretability via Explicit Word Interaction Graph Layer
This addresses the need for better interpretability in NLP for researchers and practitioners, though it appears incremental as it builds on existing neural network architectures.
The authors tackled the problem of improving interpretability in NLP models by proposing a trainable neural network layer that learns a global interaction graph between words to select more informative words, which substantially enhances both interpretability and prediction performance across multiple SOTA models and datasets.
Recent NLP literature has seen growing interest in improving model interpretability. Along this direction, we propose a trainable neural network layer that learns a global interaction graph between words and then selects more informative words using the learned word interactions. Our layer, we call WIGRAPH, can plug into any neural network-based NLP text classifiers right after its word embedding layer. Across multiple SOTA NLP models and various NLP datasets, we demonstrate that adding the WIGRAPH layer substantially improves NLP models' interpretability and enhances models' prediction performance at the same time.