CLAINov 6, 2023

Context Unlocks Emotions: Text-based Emotion Classification Dataset Auditing with Large Language Models

arXiv:2311.03551v16 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses a data quality issue for researchers and practitioners in emotion classification, offering a scalable alternative to costly re-annotation, though it is incremental as it builds on existing LLM capabilities.

The paper tackles the problem of misaligned labels in text-based emotion classification datasets due to lack of context, proposing to use large language models to synthesize additional context, which improves alignment with human-annotated labels from both empirical and human-evaluated perspectives.

The lack of contextual information in text data can make the annotation process of text-based emotion classification datasets challenging. As a result, such datasets often contain labels that fail to consider all the relevant emotions in the vocabulary. This misalignment between text inputs and labels can degrade the performance of machine learning models trained on top of them. As re-annotating entire datasets is a costly and time-consuming task that cannot be done at scale, we propose to use the expressive capabilities of large language models to synthesize additional context for input text to increase its alignment with the annotated emotional labels. In this work, we propose a formal definition of textual context to motivate a prompting strategy to enhance such contextual information. We provide both human and empirical evaluation to demonstrate the efficacy of the enhanced context. Our method improves alignment between inputs and their human-annotated labels from both an empirical and human-evaluated standpoint.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes