CLLGJan 27, 2021

Towards Robustness to Label Noise in Text Classification via Noise Modeling

arXiv:2101.11214v326 citations
Originality Incremental advance
AI Analysis

This work addresses label noise in NLP datasets, which is a common issue in large-scale text classification, but it is incremental as it builds on existing noise modeling techniques.

The paper tackles the problem of text classification with noisy labels by using a beta mixture model to estimate label noise probabilities and guide learning, resulting in improved accuracy and reduced over-fitting on two text classification tasks.

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a noisy label, through a beta mixture model fitted on the losses at an early epoch of training. Then, we use this score to selectively guide the learning of the noise model and classifier. Our empirical evaluation on two text classification tasks shows that our approach can improve over the baseline accuracy, and prevent over-fitting to the noise.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes