CLSep 3, 2017

Investigating how well contextual features are captured by bi-directional recurrent neural network models

arXiv:1709.00659v21088 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the interpretability issue in neural models for NLP, providing tools for error analysis, but it is incremental as it builds on existing RNN methods without introducing new paradigms.

The paper tackled the problem of understanding how well bi-directional recurrent neural networks capture contextual features in sequence tagging tasks, by defining three methods to analyze their ability and performing experiments on general and biomedical datasets.

Learning algorithms for natural language processing (NLP) tasks traditionally rely on manually defined relevant contextual features. On the other hand, neural network models using an only distributional representation of words have been successfully applied for several NLP tasks. Such models learn features automatically and avoid explicit feature engineering. Across several domains, neural models become a natural choice specifically when limited characteristics of data are known. However, this flexibility comes at the cost of interpretability. In this paper, we define three different methods to investigate ability of bi-directional recurrent neural networks (RNNs) in capturing contextual features. In particular, we analyze RNNs for sequence tagging tasks. We perform a comprehensive analysis on general as well as biomedical domain datasets. Our experiments focus on important contextual words as features, which can easily be extended to analyze various other feature types. We also investigate positional effects of context words and show how the developed methods can be used for error analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes