CL LGJan 27, 2021

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Siddhant Garg, Goutham Ramakrishnan, Varun Thumbe

arXiv:2101.11214v32.426 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses label noise in NLP datasets, which is a common issue in large-scale text classification, but it is incremental as it builds on existing noise modeling techniques.

The paper tackles the problem of text classification with noisy labels by using a beta mixture model to estimate label noise probabilities and guide learning, resulting in improved accuracy and reduced over-fitting on two text classification tasks.

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a noisy label, through a beta mixture model fitted on the losses at an early epoch of training. Then, we use this score to selectively guide the learning of the noise model and classifier. Our empirical evaluation on two text classification tasks shows that our approach can improve over the baseline accuracy, and prevent over-fitting to the noise.

View on arXiv PDF Code

Similar