LGCLFeb 9, 2023

The Re-Label Method For Data-Centric Machine Learning

arXiv:2302.04391v102 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This addresses data quality issues for practitioners in industry deep learning, but it is incremental as it builds on existing data-centric approaches.

The paper tackles the problem of noisy manually labeled data in industry deep learning applications by introducing a method to identify and re-label noisy data using model predictions as references, achieving over 90 score in dev dataset evaluations.

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The dev dataset evaluation results and human evaluation results verify our idea.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes