CL AIMay 20, 2024

CoNLL#: Fine-grained Error Analysis and a Corrected Test Set for CoNLL-03 English

Andrew Rueda, Elena Álvarez Mellado, Constantine Lignos

arXiv:2405.11865v123.981 citationsh-index: 6LREC

Originality Synthesis-oriented

AI Analysis

This work addresses the need for more interpretable and accurate benchmarking in named entity recognition, though it is incremental as it focuses on dataset correction rather than a new method.

The paper tackled the plateau in state-of-the-art performance on the CoNLL-03 English NER dataset by conducting a fine-grained error analysis and introducing CoNLL#, a corrected test set that addresses systematic errors to enable low-noise evaluation.

Modern named entity recognition systems have steadily improved performance in the age of larger and more powerful neural models. However, over the past several years, the state-of-the-art has seemingly hit another plateau on the benchmark CoNLL-03 English dataset. In this paper, we perform a deep dive into the test outputs of the highest-performing NER models, conducting a fine-grained evaluation of their performance by introducing new document-level annotations on the test set. We go beyond F1 scores by categorizing errors in order to interpret the true state of the art for NER and guide future work. We review previous attempts at correcting the various flaws of the test set and introduce CoNLL#, a new corrected version of the test set that addresses its systematic and most prevalent errors, allowing for low-noise, interpretable error analysis.

View on arXiv PDF

Similar