CLMay 24, 2023

You Are What You Annotate: Towards Better Models through Annotator Representations

arXiv:2305.14663v2155 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of handling subjectivity and variability in data annotation for NLP researchers and practitioners, offering an incremental improvement over existing aggregation methods.

The paper tackles the problem of annotator disagreement in NLP tasks by modeling annotator and annotation embeddings to capture diverse perspectives, resulting in significantly better performance on six out of eight datasets with less than 1% increase in model parameters.

Annotator disagreement is ubiquitous in natural language processing (NLP) tasks. There are multiple reasons for such disagreements, including the subjectivity of the task, difficult cases, unclear guidelines, and so on. Rather than simply aggregating labels to obtain data annotations, we instead try to directly model the diverse perspectives of the annotators, and explicitly account for annotators' idiosyncrasies in the modeling process by creating representations for each annotator (annotator embeddings) and also their annotations (annotation embeddings). In addition, we propose TID-8, The Inherent Disagreement - 8 dataset, a benchmark that consists of eight existing language understanding datasets that have inherent annotator disagreement. We test our approach on TID-8 and show that our approach helps models learn significantly better from disagreements on six different datasets in TID-8 while increasing model size by fewer than 1% parameters. By capturing the unique tendencies and subjectivity of individual annotators through embeddings, our representations prime AI models to be inclusive of diverse viewpoints.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes