CLMay 17, 2025

EmoHopeSpeech: An Annotated Dataset of Emotions and Hope Speech in English and Arabic

arXiv:2505.11959v26.71 citationsh-index: 29RANLP

Originality Synthesis-oriented

AI Analysis

This provides a resource for NLP in underrepresented languages, enabling cross-linguistic analysis of emotions and hope speech, but it is incremental as it primarily introduces a new dataset.

The researchers tackled the scarcity of multi-emotion datasets by creating a bilingual dataset of 33,492 entries in Arabic and English, annotated for emotions and hope speech, with validation showing high annotator agreement (Fleiss' Kappa 0.75-0.85) and a baseline model achieving a micro-F1 score of 0.67.

This research introduces a bilingual dataset comprising 23,456 entries for Arabic and 10,036 entries for English, annotated for emotions and hope speech, addressing the scarcity of multi-emotion (Emotion and hope) datasets. The dataset provides comprehensive annotations capturing emotion intensity, complexity, and causes, alongside detailed classifications and subcategories for hope speech. To ensure annotation reliability, Fleiss' Kappa was employed, revealing 0.75-0.85 agreement among annotators both for Arabic and English language. The evaluation metrics (micro-F1-Score=0.67) obtained from the baseline model (i.e., using a machine learning model) validate that the data annotations are worthy. This dataset offers a valuable resource for advancing natural language processing in underrepresented languages, fostering better cross-linguistic analysis of emotions and hope speech.

View on arXiv PDF

Similar