CLMay 9

EmoS: A High-Fidelity Multimodal Benchmark for Fine-grained Streaming Emotional Understanding

Pengze Guo, Jingxi Liang, Zhiwen Xie, Qifeng Wang, Derek F. Wong

arXiv:2605.0884789.1Has Code

AI Analysis

For researchers in emotion recognition and empathetic AI, EmoS provides a more ecologically valid and reliable benchmark for fine-grained emotional understanding.

Existing benchmarks for emotional understanding lack ecological validity, signal clarity, and fine-grained labeling. EmoS introduces a high-fidelity bilingual benchmark with static slices and a dynamic Streaming Monologue subset, using dual-layer human annotation, and shows that fine-tuning MLLMs on it yields significant gains over zero-shot baselines.

In the context of today's high-pressure, aging society, the demand for large-scale emotional models capable of providing empathetic support is more critical than ever. However, existing benchmarks fail to simultaneously achieve ecological validity, signal clarity, and reliable fine-grained labeling. We introduce EmoS, a high-fidelity bilingual benchmark designed to resolve the limitations of ecological validity and noise in existing datasets by combining strictly filtered static slices with a dynamic Streaming Monologue subset. Supported by a rigorous dual-layer human annotation pipeline, EmoS provides trusted ground truth that captures continuous emotional evolution. Empirical results show that fine-tuning MLLMs (multimodal large language models) on EmoS yields significant gains over zero-shot baselines, laying the foundation for the training and evaluation of future emotion recognition models and empathy models. The dataset and code are publicly available at https://github.com/NLP2CT/EmoS.

View on arXiv PDF Code

Similar