CLSDASAug 26, 2025

Empathy Omni: Enabling Empathetic Speech Response Generation through Large Language Models

arXiv:2508.18655v32 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the need for emotionally aware speech assistants with limited data, though it appears incremental as it builds on existing speech LLM frameworks.

The paper tackles the problem of generating empathetic speech responses in human-machine interaction by proposing Emotion Omni, a model that achieves comparable instruction-following without large-scale pretraining and surpasses existing models in speech quality (UTMOS: 4.41) and empathy (Emotion GPT Score: 3.97).

With the development of speech large language models (speech LLMs), users can now interact directly with assistants via speech. However, most existing models only convert response content into speech without fully capturing the rich emotional cues in user queries, where the same sentence may convey different meanings depending on the expression. Emotional understanding is thus essential for improving human-machine interaction. Most empathetic speech LLMs rely on massive datasets, demanding high computational cost. A key challenge is to build models that generate empathetic responses with limited data and without large-scale training. To this end, we propose Emotion Omni, a model that understands emotional content in user speech and generates empathetic responses. We further developed a data pipeline to construct a 200k emotional dialogue dataset supporting empathetic speech assistants. Experiments show that Emotion Omni achieves comparable instruction-following ability without large-scale pretraining, while surpassing existing models in speech quality (UTMOS:4.41) and empathy (Emotion GPT Score: 3.97). These results confirm its improvements in both speech fidelity and emotional expressiveness. Demos are available at https://w311411.github.io/omni_demo/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes