The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human Labels
This is a foundational position paper challenging assumptions in data labeling for machine learning, with potential broad impact on how human labels are captured and used.
The paper examines perspectivist approaches that treat annotator disagreement as valuable information rather than a problem to minimize, concluding with recommendations for data labeling pipelines and future research on subjectivity.
Longstanding data labeling practices in machine learning involve collecting and aggregating labels from multiple annotators. But what should we do when annotators disagree? Though annotator disagreement has long been seen as a problem to minimize, new perspectivist approaches challenge this assumption by treating disagreement as a valuable source of information. In this position paper, we examine practices and assumptions surrounding the causes of disagreement--some challenged by perspectivist approaches, and some that remain to be addressed--as well as practical and normative challenges for work operating under these assumptions. We conclude with recommendations for the data labeling pipeline and avenues for future research engaging with subjectivity and disagreement.