University of Indonesia at SemEval-2025 Task 11: Evaluating State-of-the-Art Encoders for Multi-Label Emotion Detection
This work addresses emotion classification for multilingual NLP applications, but it is incremental as it applies existing methods to a new benchmark task.
The paper tackled multi-label emotion detection across 28 languages by comparing fine-tuning and classifier-only training strategies, finding that prompt-based encoders with CatBoost classifiers outperformed fully fine-tuned models, achieving an average F1-macro score of 56.58.
This paper presents our approach for SemEval 2025 Task 11 Track A, focusing on multilabel emotion classification across 28 languages. We explore two main strategies: fully fine-tuning transformer models and classifier-only training, evaluating different settings such as fine-tuning strategies, model architectures, loss functions, encoders, and classifiers. Our findings suggest that training a classifier on top of prompt-based encoders such as mE5 and BGE yields significantly better results than fully fine-tuning XLMR and mBERT. Our best-performing model on the final leaderboard is an ensemble combining multiple BGE models, where CatBoost serves as the classifier, with different configurations. This ensemble achieves an average F1-macro score of 56.58 across all languages.