CLJan 30, 2021

Learning From How Humans Correct

arXiv:2102.00225v20
Originality Incremental advance
AI Analysis

This work addresses noisy data issues in industry NLP applications, but it is incremental as it builds on existing correction and relabeling approaches.

The paper tackles the problem of noisy labeled data in NLP applications by proposing a method to identify and manually relabel noisy data while collecting correction information, which is then incorporated into a deep learning model to improve classification accuracy from 91.7% to 92.5% on a test dataset.

In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and relabel them manually, meanwhile we collect the correction information. Then we present novel method to incorporate the human correction information into deep learning model. Human know how to correct noisy data. So the correction information can be inject into deep learning model. We do the experiment on our own text classification dataset, which is manually labeled, because we need to relabel the noisy data in our dataset for our industry application. The experiment result shows that our learn-on-correction method improve the classification accuracy from 91.7% to 92.5% in test dataset. The 91.7% accuracy is trained on the corrected dataset, which improve the baseline from 83.3% to 91.7% in test dataset. The accuracy under human evaluation achieves more than 97%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes