CLIRLGSep 23, 2019

Portuguese Named Entity Recognition using BERT-CRF

arXiv:1909.10649v2294 citations
Originality Synthesis-oriented
AI Analysis

This work addresses named entity recognition for Portuguese, an incremental improvement using existing methods on new data.

The authors tackled Portuguese named entity recognition by training Portuguese BERT models and using a BERT-CRF architecture, achieving new state-of-the-art results with F1-score improvements of 1 point on 5 NE classes and 4 points on 10 NE classes.

Recent advances in language representation using neural networks have made it viable to transfer the learned internal states of a trained model to downstream natural language processing tasks, such as named entity recognition (NER) and question answering. It has been shown that the leverage of pre-trained language models improves the overall performance on many tasks and is highly beneficial when labeled data is scarce. In this work, we train Portuguese BERT models and employ a BERT-CRF architecture to the NER task on the Portuguese language, combining the transfer capabilities of BERT with the structured predictions of CRF. We explore feature-based and fine-tuning training strategies for the BERT model. Our fine-tuning approach obtains new state-of-the-art results on the HAREM I dataset, improving the F1-score by 1 point on the selective scenario (5 NE classes) and by 4 points on the total scenario (10 NE classes).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes