Human Label Variation in Implicit Discourse Relation Recognition
This work addresses the challenge of human label variation in IDRR, which is crucial for developing more robust and reliable NLP models for discourse understanding.
This paper investigates human label variation in Implicit Discourse Relation Recognition (IDRR), a task known for high ambiguity due to cognitive complexity. The study found that models predicting full annotation distributions produce more stable predictions compared to existing annotator-specific models, which perform poorly in IDRR unless ambiguity is reduced.
There is growing recognition that many NLP tasks lack a single ground truth, as human judgments reflect diverse perspectives. To capture this variation, models have been developed to predict full annotation distributions rather than majority labels, while perspectivist models aim to reproduce the interpretations of individual annotators. In this work, we compare these approaches on Implicit Discourse Relation Recognition (IDRR), a highly ambiguous task where disagreement often arises from cognitive complexity rather than ideological bias. Our experiments show that existing annotator-specific models perform poorly in IDRR unless ambiguity is reduced, whereas models trained on label distributions yield more stable predictions. Further analysis indicates that frequent cognitively demanding cases drive inconsistency in human interpretation, posing challenges for perspectivist modeling in IDRR.