CLMay 1, 2020

GoEmotions: A Dataset of Fine-Grained Emotions

arXiv:2005.00547v21104 citations
AI Analysis

This provides a large-scale dataset for researchers and developers working on applications like empathetic chatbots and online behavior detection, though it is incremental in expanding emotion annotation resources.

The authors tackled the problem of understanding fine-grained emotions in language by introducing GoEmotions, a manually annotated dataset of 58k Reddit comments labeled for 27 emotion categories, and demonstrated its generalization to other domains with a BERT-based model achieving an average F1-score of .46.

Understanding emotion expressed in language has a wide range of applications, from building empathetic chatbots to detecting harmful online behavior. Advancement in this area can be improved using large-scale datasets with a fine-grained typology, adaptable to multiple downstream tasks. We introduce GoEmotions, the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral. We demonstrate the high quality of the annotations via Principal Preserved Component Analysis. We conduct transfer learning experiments with existing emotion benchmarks to show that our dataset generalizes well to other domains and different emotion taxonomies. Our BERT-based model achieves an average F1-score of .46 across our proposed taxonomy, leaving much room for improvement.

Code Implementations9 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes