LGAISDNov 12, 2025

Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues

arXiv:2511.11691v1
Originality Incremental advance
AI Analysis

This addresses the need for more interpretable and trustworthy SER models, though it is incremental as it builds on existing saliency methods.

The paper tackled the problem of saliency-based explanations in speech emotion recognition (SER) lacking meaningful acoustic connections, and the result was a framework that improved explanation quality by linking saliency to expert-referenced acoustic cues.

Explainable AI (XAI) for Speech Emotion Recognition (SER) is critical for building transparent, trustworthy models. Current saliency-based methods, adapted from vision, highlight spectrogram regions but fail to show whether these regions correspond to meaningful acoustic markers of emotion, limiting faithfulness and interpretability. We propose a framework that overcomes these limitations by quantifying the magnitudes of cues within salient regions. This clarifies "what" is highlighted and connects it to "why" it matters, linking saliency to expert-referenced acoustic cues of speech emotions. Experiments on benchmark SER datasets show that our approach improves explanation quality by explicitly linking salient regions to theory-driven speech emotions expert-referenced acoustics. Compared to standard saliency methods, it provides more understandable and plausible explanations of SER models, offering a foundational step towards trustworthy speech-based affective computing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes