Self-Attention Gazetteer Embeddings for Named-Entity Recognition
This work addresses the problem of enhancing NER accuracy for NLP practitioners, but it is incremental as it builds on existing methods with modest gains.
The paper tackled improving named-entity recognition by integrating external knowledge from gazetteers, resulting in F1 score improvements from 92.34 to 92.86 on CoNLL-03 and from 89.11 to 89.32 on Ontonotes 5 datasets.
Recent attempts to ingest external knowledge into neural models for named-entity recognition (NER) have exhibited mixed results. In this work, we present GazSelfAttn, a novel gazetteer embedding approach that uses self-attention and match span encoding to build enhanced gazetteer embeddings. In addition, we demonstrate how to build gazetteer resources from the open source Wikidata knowledge base. Evaluations on CoNLL-03 and Ontonotes 5 datasets, show F1 improvements over baseline model from 92.34 to 92.86 and 89.11 to 89.32 respectively, achieving performance comparable to large state-of-the-art models.