Noise Correction on Subjective Datasets
This work addresses annotation biases in subjective datasets, which is an incremental improvement for researchers and practitioners in data modeling and machine learning.
The paper tackles the problem of distorted annotations in subjective datasets due to annotator fatigue and changing opinions, proposing a multitask learning approach with loss-based label correction to separate agreeing and disagreeing annotations and improve prediction performance, demonstrating robustness to additional label noise.
Incorporating every annotator's perspective is crucial for unbiased data modeling. Annotator fatigue and changing opinions over time can distort dataset annotations. To combat this, we propose to learn a more accurate representation of diverse opinions by utilizing multitask learning in conjunction with loss-based label correction. We show that using our novel formulation, we can cleanly separate agreeing and disagreeing annotations. Furthermore, this method provides a controllable way to encourage or discourage disagreement. We demonstrate that this modification can improve prediction performance in a single or multi-annotator setting. Lastly, we show that this method remains robust to additional label noise that is applied to subjective data.