HCAIMar 10

Reading the Mood Behind Words: Integrating Prosody-Derived Emotional Context into Socially Responsive VR Agents

arXiv:2603.09324v19.8h-index: 6
Predicted impact top 48% in HC · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the issue of emotionally incongruent responses in VR interactions for users, though it is incremental as it builds on existing LLM and emotion recognition methods.

The paper tackled the problem of VR agents missing emotional cues from prosody by integrating real-time speech emotion recognition into an LLM-based pipeline, resulting in significant improvements in dialogue quality and 93.3% participant preference for the emotion-aware agent.

In VR interactions with embodied conversational agents, users' emotional intent is often conveyed more by how something is said than by what is said. However, most VR agent pipelines rely on speech-to-text processing, discarding prosodic cues and often producing emotionally incongruent responses despite correct semantics. We propose an emotion-context-aware VR interaction pipeline that treats vocal emotion as explicit dialogue context in an LLM-based conversational agent. A real-time speech emotion recognition model infers users' emotional states from prosody, and the resulting emotion labels are injected into the agent's dialogue context to shape response tone and style. Results from a within-subjects VR study (N=30) show significant improvements in dialogue quality, naturalness, engagement, rapport, and human-likeness, with 93.3% of participants preferring the emotion-aware agent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes