CLNov 3, 2020

XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Emily Öhman, Marc Pàmies, Kaisla Kajava, Jörg Tiedemann

arXiv:2011.01612v231.2999 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a new resource for sentiment analysis and emotion detection, particularly benefiting low-resource languages, but it is incremental as it builds on existing dataset creation methods.

The authors introduced XED, a multilingual fine-grained emotion dataset with human-annotated Finnish and English sentences and projected annotations for 30 additional languages, showing it performs on par with similar datasets in evaluations using language-specific BERT models and SVMs.

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik's core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

View on arXiv PDF Code

Similar