CVAug 8, 2025

More Is Better: A MoE-Based Emotion Recognition Framework with Human Preference Alignment

Jun Xie, Yingjian Zhu, Feng Chen, Zhenghao Zhang, Xiaohui Fan, Hongzhu Yi, Xinming Wang, Chen Yu, Yue Bi, Zhaoran Zhao, Xiongjun Guan, Zhepeng Wang

arXiv:2508.06036v110.25 citationsh-index: 6Has CodeMRAC@MM

Originality Incremental advance

AI Analysis

This work addresses emotion recognition for applications like human-computer interaction, but it is incremental as it builds on existing MoE and pseudo-labeling techniques.

The paper tackles emotion recognition in semi-supervised learning by proposing a Mixture of Experts framework that integrates diverse modalities and uses consensus-based pseudo-labeling, achieving an F1-score of 0.8772 and ranking 2nd in the MER2025-SEMI challenge.

In this paper, we present our solution for the semi-supervised learning track (MER-SEMI) in MER2025. We propose a comprehensive framework, grounded in the principle that "more is better," to construct a robust Mixture of Experts (MoE) emotion recognition system. Our approach integrates a diverse range of input modalities as independent experts, including novel signals such as knowledge from large Vision-Language Models (VLMs) and temporal Action Unit (AU) information. To effectively utilize unlabeled data, we introduce a consensus-based pseudo-labeling strategy, generating high-quality labels from the agreement between a baseline model and Gemini, which are then used in a two-stage training paradigm. Finally, we employ a multi-expert voting ensemble combined with a rule-based re-ranking process to correct prediction bias and better align the outputs with human preferences. Evaluated on the MER2025-SEMI challenge dataset, our method achieves an F1-score of 0.8772 on the test set, ranking 2nd in the track. Our code is available at https://github.com/zhuyjan/MER2025-MRAC25.

View on arXiv PDF Code

Similar