Neural Networks as Explicit Word-Based Rules
This work provides a method for interpreting neural networks in NLP, which is incremental as it adapts visualization techniques from computer vision to a new domain.
The authors tackled the problem of interpreting weight matrices in convolutional networks for NLP tasks by visualizing them as word-based rules, and they recovered the original model's performance using these rules.
Filters of convolutional networks used in computer vision are often visualized as image patches that maximize the response of the filter. We use the same approach to interpret weight matrices in simple architectures for natural language processing tasks. We interpret a convolutional network for sentiment classification as word-based rules. Using the rule, we recover the performance of the original model.