CLAILGMay 2, 2024

The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation

arXiv:2405.01299v295 citationsh-index: 18NLPERSPECTIVES
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of evaluating LLMs as annotators for researchers and practitioners, but it is incremental as it builds on existing studies.

This paper analyzed twelve studies on using LLMs for data annotation, finding they offer cost and time benefits but have limitations like bias and prompt sensitivity, and empirically showed that directly obtaining opinion distributions from GPT aligns with human distributions across four subjective datasets.

Large Language Models (LLMs) have emerged as powerful support tools across various natural language tasks and a range of application domains. Recent studies focus on exploring their capabilities for data annotation. This paper provides a comparative overview of twelve studies investigating the potential of LLMs in labelling data. While the models demonstrate promising cost and time-saving benefits, there exist considerable limitations, such as representativeness, bias, sensitivity to prompt variations and English language preference. Leveraging insights from these studies, our empirical analysis further examines the alignment between human and GPT-generated opinion distributions across four subjective datasets. In contrast to the studies examining representation, our methodology directly obtains the opinion distribution from GPT. Our analysis thereby supports the minority of studies that are considering diverse perspectives when evaluating data annotation tasks and highlights the need for further research in this direction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes