CL SDJan 8

A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction

Qing Wang, Zehan Li, Yaodong Song, Hongjie Chen, Jian Kang, Jie Lian, Jie Li, Yongxiang Li, Xuelong Li

arXiv:2601.04960v11.11 citationsh-index: 3

Originality Highly original

AI Analysis

This work addresses emotional intelligence in human-like spoken dialogue systems, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles the problem of emotional intelligence in spoken language models by introducing Injected Emotional-Attribution Thinking (IEAT), which incorporates user emotional states and their causes into internal reasoning, achieving top-ranked performance on the HumDial benchmark for emotional trajectory modeling, reasoning, and empathetic response generation.

This paper presents a unified spoken language model for emotional intelligence, enhanced by a novel data construction strategy termed Injected Emotional-Attribution Thinking (IEAT). IEAT incorporates user emotional states and their underlying causes into the model's internal reasoning process, enabling emotion-aware reasoning to be internalized rather than treated as explicit supervision. The model is trained with a two-stage progressive strategy. The first stage performs speech-text alignment and emotional attribute modeling via self-distillation, while the second stage conducts end-to-end cross-modal joint optimization to ensure consistency between textual and spoken emotional expressions. Experiments on the Human-like Spoken Dialogue Systems Challenge (HumDial) Emotional Intelligence benchmark demonstrate that the proposed approach achieves top-ranked performance across emotional trajectory modeling, emotional reasoning, and empathetic response generation under both LLM-based and human evaluations.

View on arXiv PDF

Similar