LGAIHCNov 1, 2023

Noise Correction on Subjective Datasets

arXiv:2311.00619v321.428 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses annotation biases in subjective datasets, which is an incremental improvement for researchers and practitioners in data modeling and machine learning.

The paper tackles the problem of distorted annotations in subjective datasets due to annotator fatigue and changing opinions, proposing a multitask learning approach with loss-based label correction to separate agreeing and disagreeing annotations and improve prediction performance, demonstrating robustness to additional label noise.

Incorporating every annotator's perspective is crucial for unbiased data modeling. Annotator fatigue and changing opinions over time can distort dataset annotations. To combat this, we propose to learn a more accurate representation of diverse opinions by utilizing multitask learning in conjunction with loss-based label correction. We show that using our novel formulation, we can cleanly separate agreeing and disagreeing annotations. Furthermore, this method provides a controllable way to encourage or discourage disagreement. We demonstrate that this modification can improve prediction performance in a single or multi-annotator setting. Lastly, we show that this method remains robust to additional label noise that is applied to subjective data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes