CLFeb 18

Label-Consistent Data Generation for Aspect-Based Sentiment Analysis Using LLM Agents

Mohammad H. A. Monfared, Lucie Flek, Akbar Karimi

arXiv:2602.16379v1h-index: 11

Originality Incremental advance

AI Analysis

This work addresses data scarcity for ABSA tasks, offering an incremental improvement in synthetic data generation methods.

The paper tackled the problem of generating high-quality synthetic training data for Aspect-Based Sentiment Analysis (ABSA) by proposing an agentic data augmentation method that uses iterative generation and verification, which outperformed a prompting-based baseline in label preservation and provided higher gains when combined with real data, especially for the T5-Base model.

We propose an agentic data augmentation method for Aspect-Based Sentiment Analysis (ABSA) that uses iterative generation and verification to produce high quality synthetic training examples. To isolate the effect of agentic structure, we also develop a closely matched prompting-based baseline using the same model and instructions. Both methods are evaluated across three ABSA subtasks (Aspect Term Extraction (ATE), Aspect Sentiment Classification (ATSC), and Aspect Sentiment Pair Extraction (ASPE)), four SemEval datasets, and two encoder-decoder models: T5-Base and Tk-Instruct. Our results show that the agentic augmentation outperforms raw prompting in label preservation of the augmented data, especially when the tasks require aspect term generation. In addition, when combined with real data, agentic augmentation provides higher gains, consistently outperforming prompting-based generation. These benefits are most pronounced for T5-Base, while the more heavily pretrained Tk-Instruct exhibits smaller improvements. As a result, augmented data helps T5-Base achieve comparable performance with its counterpart.

View on arXiv PDF

Similar