CLJan 14

Beyond Consensus: Perspectivist Modeling and Evaluation of Annotator Disagreement in NLP

arXiv:2601.09065v15 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

It addresses the challenge of handling subjective and ambiguous tasks like toxicity detection for NLP researchers, but it is incremental as a survey that synthesizes existing work rather than introducing new methods.

This survey tackles the problem of annotator disagreement in NLP by providing a unified taxonomy of its sources and synthesizing modeling approaches, highlighting a shift from consensus learning to explicitly modeling disagreement and capturing structured relationships among annotators.

Annotator disagreement is widespread in NLP, particularly for subjective and ambiguous tasks such as toxicity detection and stance analysis. While early approaches treated disagreement as noise to be removed, recent work increasingly models it as a meaningful signal reflecting variation in interpretation and perspective. This survey provides a unified view of disagreement-aware NLP methods. We first present a domain-agnostic taxonomy of the sources of disagreement spanning data, task, and annotator factors. We then synthesize modeling approaches using a common framework defined by prediction targets and pooling structure, highlighting a shift from consensus learning toward explicitly modeling disagreement, and toward capturing structured relationships among annotators. We review evaluation metrics for both predictive performance and annotator behavior, and noting that most fairness evaluations remain descriptive rather than normative. We conclude by identifying open challenges and future directions, including integrating multiple sources of variation, developing disagreement-aware interpretability frameworks, and grappling with the practical tradeoffs of perspectivist modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes