CLMay 25, 2022

Detecting Label Errors by using Pre-Trained Language Models

Derek Chong, Jenny Hong, Christopher D. Manning

Stanford

arXiv:2205.12702v324.5298 citationsh-index: 147Has Code

Originality Incremental advance

AI Analysis

This addresses label noise issues for NLP practitioners, offering a more effective error-detection method, though it is incremental as it builds on existing pre-trained models.

The paper tackles the problem of detecting label errors in natural language datasets by showing that pre-trained language models, when fine-tuned and sorted by task loss, outperform existing methods. The result includes a 9-36% higher absolute Area Under the Precision-Recall Curve on real datasets like IMDB and Amazon Reviews.

We show that large pre-trained language models are inherently highly capable of identifying label errors in natural language datasets: simply examining out-of-sample data points in descending order of fine-tuned task loss significantly outperforms more complex error-detection mechanisms proposed in previous work. To this end, we contribute a novel method for introducing realistic, human-originated label noise into existing crowdsourced datasets such as SNLI and TweetNLP. We show that this noise has similar properties to real, hand-verified label errors, and is harder to detect than existing synthetic noise, creating challenges for model robustness. We argue that human-originated noise is a better standard for evaluation than synthetic noise. Finally, we use crowdsourced verification to evaluate the detection of real errors on IMDB, Amazon Reviews, and Recon, and confirm that pre-trained models perform at a 9-36% higher absolute Area Under the Precision-Recall Curve than existing models.

View on arXiv PDF Code

Similar