CLMay 29, 2025

EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian

Daryna Dementieva, Nikolay Babakov, Alexander Fraser

arXiv:2505.23297v24.91 citationsh-index: 16EMNLP

Originality Synthesis-oriented

AI Analysis

This work addresses the lack of emotion detection resources for Ukrainian NLP, though it is incremental as it adapts existing annotation schemas to a new language.

The authors introduced EmoBench-UA, the first annotated benchmark dataset for emotion detection in Ukrainian texts, and evaluated various approaches including linguistic baselines, synthetic translations, and large language models, finding significant challenges in emotion classification for non-mainstream languages like Ukrainian.

While Ukrainian NLP has seen progress in many texts processing tasks, emotion classification remains an underexplored area with no publicly available benchmark to date. In this work, we introduce EmoBench-UA, the first annotated dataset for emotion detection in Ukrainian texts. Our annotation schema is adapted from the previous English-centric works on emotion detection (Mohammad et al., 2018; Mohammad, 2022) guidelines. The dataset was created through crowdsourcing using the Toloka.ai platform ensuring high-quality of the annotation process. Then, we evaluate a range of approaches on the collected dataset, starting from linguistic-based baselines, synthetic data translated from English, to large language models (LLMs). Our findings highlight the challenges of emotion classification in non-mainstream languages like Ukrainian and emphasize the need for further development of Ukrainian-specific models and training resources.

View on arXiv PDF

Similar