CLOct 9, 2020

Constrained Decoding for Computationally Efficient Named Entity Recognition Taggers

arXiv:2010.04362v1997 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the problem of slow training times for NER systems, benefiting researchers and practitioners, though it is incremental as it builds on existing span encoding schemes.

The paper tackles the computational inefficiency of CRF-based NER models by constraining outputs to suppress illegal transitions, enabling training twice as fast with statistically insignificant F1 differences, effectively eliminating the need for a CRF.

Current state-of-the-art models for named entity recognition (NER) are neural models with a conditional random field (CRF) as the final layer. Entities are represented as per-token labels with a special structure in order to decode them into spans. Current work eschews prior knowledge of how the span encoding scheme works and relies on the CRF learning which transitions are illegal and which are not to facilitate global coherence. We find that by constraining the output to suppress illegal transitions we can train a tagger with a cross-entropy loss twice as fast as a CRF with differences in F1 that are statistically insignificant, effectively eliminating the need for a CRF. We analyze the dynamics of tag co-occurrence to explain when these constraints are most effective and provide open source implementations of our tagger in both PyTorch and TensorFlow.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes