CL AI LGMay 13, 2024

NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition

Elena Merdjanovska, Ansar Aynetdinov, Alan Akbik

arXiv:2405.07609v213.223 citationsh-index: 3Has CodeEMNLP

Originality Incremental advance

AI Analysis

This addresses the need for realistic noise evaluation in NER, which is incremental as it builds on prior noise-robust learning approaches by providing a new benchmark.

The paper tackles the problem of unrealistic simulated label noise in named entity recognition (NER) by introducing NoiseBench, a benchmark with real noise from sources like human error and LLMs, showing that real noise is significantly more challenging and current models fall far short of theoretical upper bounds.

Available training data for named entity recognition (NER) often contains a significant percentage of incorrect labels for entity types and entity boundaries. Such label noise poses challenges for supervised learning and may significantly deteriorate model quality. To address this, prior work proposed various noise-robust learning approaches capable of learning from data with partially incorrect labels. These approaches are typically evaluated using simulated noise where the labels in a clean dataset are automatically corrupted. However, as we show in this paper, this leads to unrealistic noise that is far easier to handle than real noise caused by human error or semi-automatic annotation. To enable the study of the impact of various types of real noise, we introduce NoiseBench, an NER benchmark consisting of clean training data corrupted with 6 types of real noise, including expert errors, crowdsourcing errors, automatic annotation errors and LLM errors. We present an analysis that shows that real noise is significantly more challenging than simulated noise, and show that current state-of-the-art models for noise-robust learning fall far short of their theoretically achievable upper bound. We release NoiseBench to the research community.

View on arXiv PDF Code

Similar