LGCLMLSep 28, 2018

Embedded-State Latent Conditional Random Fields for Sequence Labeling

arXiv:1809.10835v11097 citations
Originality Incremental advance
AI Analysis

This addresses the need for more accurate information extraction in NLP by enhancing sequence labeling with better handling of non-local output constraints, though it is incremental as it builds on existing CRF and RNN methods.

The paper tackles the problem of complex sequence labeling tasks where non-local constraints between output labels exist, by introducing a model that integrates RNN features with a more expressive global graphical model using embedded latent states, resulting in improved performance over baseline CRF+RNN models when global constraints are necessary.

Complex textual information extraction tasks are often posed as sequence labeling or \emph{shallow parsing}, where fields are extracted using local labels made consistent through probabilistic inference in a graphical model with constrained transitions. Recently, it has become common to locally parametrize these models using rich features extracted by recurrent neural networks (such as LSTM), while enforcing consistent outputs through a simple linear-chain model, representing Markovian dependencies between successive labels. However, the simple graphical model structure belies the often complex non-local constraints between output labels. For example, many fields, such as a first name, can only occur a fixed number of times, or in the presence of other fields. While RNNs have provided increasingly powerful context-aware local features for sequence tagging, they have yet to be integrated with a global graphical model of similar expressivity in the output distribution. Our model goes beyond the linear chain CRF to incorporate multiple hidden states per output label, but parametrizes their transitions parsimoniously with low-rank log-potential scoring matrices, effectively learning an embedding space for hidden states. This augmented latent space of inference variables complements the rich feature representation of the RNN, and allows exact global inference obeying complex, learned non-local output constraints. We experiment with several datasets and show that the model outperforms baseline CRF+RNN models when global output constraints are necessary at inference-time, and explore the interpretable latent structure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes