CLNov 22, 2017

Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data?

arXiv:1711.08231v31094 citations
Originality Incremental advance
AI Analysis

This addresses the limitation of existing neural models in handling longer tag dependencies for sequence labeling tasks like chunking and NER, representing an incremental improvement.

The paper tackles the problem of capturing long-distance tag dependencies in sequence labeling by proposing a Multi-Order BiLSTM model that combines low and high order LSTMs with pruning for scalability. It achieves state-of-the-art results in chunking and competitive results in NER datasets.

Existing neural models usually predict the tag of the current token independent of the neighboring tags. The popular LSTM-CRF model considers the tag dependencies between every two consecutive tags. However, it is hard for existing neural models to take longer distance dependencies of tags into consideration. The scalability is mainly limited by the complex model structures and the cost of dynamic programming during training. In our work, we first design a new model called "high order LSTM" to predict multiple tags for the current token which contains not only the current tag but also the previous several tags. We call the number of tags in one prediction as "order". Then we propose a new method called Multi-Order BiLSTM (MO-BiLSTM) which combines low order and high order LSTMs together. MO-BiLSTM keeps the scalability to high order models with a pruning technique. We evaluate MO-BiLSTM on all-phrase chunking and NER datasets. Experiment results show that MO-BiLSTM achieves the state-of-the-art result in chunking and highly competitive results in two NER datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes