ASSDApr 13

HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models

arXiv:2604.1159480.21 citationsh-index: 6
Predicted impact top 18% in AS · last 90 daysOriginality Incremental advance
AI Analysis

Provides a more realistic and rigorous benchmark for emotional intelligence in audio language models, addressing limitations of existing synthetic and single-turn benchmarks.

HumDial-EIBench is a benchmark for evaluating emotional intelligence in audio language models using real-recorded multi-turn dialogues, reformulating tasks into multiple-choice questions to reduce bias. Evaluations of eight models show they struggle with multi-turn emotional tracking and implicit causal reasoning, and exhibit text-dominance bias in cross-modal conflicts.

Evaluating the emotional intelligence (EI) of audio language models (ALMs) is critical. However, existing benchmarks mostly rely on synthesized speech, are limited to single-turn interactions, and depend heavily on open-ended scoring. This paper proposes HumDial-EIBench, a comprehensive benchmark for evaluating ALMs' EI. Using real-recorded human dialogues from the ICASSP 2026 HumDial Challenge, it reformulates emotional tracking and causal reasoning into multiple-choice questions with adversarial distractors, mitigating subjective scoring bias for cognitive tasks. It retains the generation of empathetic responses and introduces an acoustic-semantic conflict task to assess robustness against contradictory multimodal signals. Evaluations of eight ALMs reveal that most models struggle with multi-turn emotional tracking and implicit causal reasoning. Furthermore, all models exhibit decoupled textual and acoustic empathy, alongside a severe text-dominance bias during cross-modal conflicts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes