CLJan 26

Fine-Grained Emotion Detection on GoEmotions: Experimental Comparison of Classical Machine Learning, BiLSTM, and Transformer Models

arXiv:2601.18162v1h-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of fine-grained emotion detection for NLP researchers and practitioners, but it is incremental as it benchmarks existing methods on a known dataset.

The paper tackled fine-grained emotion recognition as a multi-label NLP task by benchmarking logistic regression, BiLSTM, and BERT models on the GoEmotions dataset, finding that logistic regression achieved the highest Micro-F1 of 0.51 while BERT attained the best overall balance with Macro-F1 0.49, Hamming Loss 0.036, and Subset Accuracy 0.36.

Fine-grained emotion recognition is a challenging multi-label NLP task due to label overlap and class imbalance. In this work, we benchmark three modeling families on the GoEmotions dataset: a TF-IDF-based logistic regression system trained with binary relevance, a BiLSTM with attention, and a BERT model fine-tuned for multi-label classification. Experiments follow the official train/validation/test split, and imbalance is mitigated using inverse-frequency class weights. Across several metrics, namely Micro-F1, Macro-F1, Hamming Loss, and Subset Accuracy, we observe that logistic regression attains the highest Micro-F1 of 0.51, while BERT achieves the best overall balance surpassing the official paper's reported results, reaching Macro-F1 0.49, Hamming Loss 0.036, and Subset Accuracy 0.36. This suggests that frequent emotions often rely on surface lexical cues, whereas contextual representations improve performance on rarer emotions and more ambiguous examples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes