CLAILGNov 9, 2023

Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

arXiv:2311.05265v1137 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses the issue of wasted annotation information in classification tasks, offering a practical improvement for data annotation and training processes.

The paper tackles the problem of discarding annotator disagreement in single-label classification by proposing a soft label method that uses ambiguous annotations, resulting in improved classifier performance and calibration on hard label test sets.

In this paper, we address the limitations of the common data annotation and training methods for objective single-label classification tasks. Typically, when annotating such tasks annotators are only asked to provide a single label for each sample and annotator disagreement is discarded when a final hard label is decided through majority voting. We challenge this traditional approach, acknowledging that determining the appropriate label can be difficult due to the ambiguity and lack of context in the data samples. Rather than discarding the information from such ambiguous annotations, our soft label method makes use of them for training. Our findings indicate that additional annotator information, such as confidence, secondary label and disagreement, can be used to effectively generate soft labels. Training classifiers with these soft labels then leads to improved performance and calibration on the hard label test set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes