CLSDASSep 14, 2023

Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features

arXiv:2309.07733v1107 citationsh-index: 45
Originality Incremental advance
AI Analysis

This addresses the lack of interpretable explanations for speech models, which is a problem for users and researchers in speech processing, though it is incremental as it builds on existing XAI techniques.

The paper tackles the problem of explaining speech classification models by introducing a method that generates word-level audio segment explanations and paralinguistic feature-based counterfactuals, validated on state-of-the-art spoken language understanding models in English and Italian, showing the explanations are faithful and plausible.

Recent advances in eXplainable AI (XAI) have provided new insights into how models for vision, language, and tabular data operate. However, few approaches exist for understanding speech models. Existing work focuses on a few spoken language understanding (SLU) tasks, and explanations are difficult to interpret for most users. We introduce a new approach to explain speech classification models. We generate easy-to-interpret explanations via input perturbation on two information levels. 1) Word-level explanations reveal how each word-related audio segment impacts the outcome. 2) Paralinguistic features (e.g., prosody and background noise) answer the counterfactual: ``What would the model prediction be if we edited the audio signal in this way?'' We validate our approach by explaining two state-of-the-art SLU models on two speech classification tasks in English and Italian. Our findings demonstrate that the explanations are faithful to the model's inner workings and plausible to humans. Our method and findings pave the way for future research on interpreting speech models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes