CLAISep 30, 2025

The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models

arXiv:2509.26543v12 citationsh-index: 34Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Originality Incremental advance
AI Analysis

This work addresses the problem of interpretability in speech-to-text models for researchers and practitioners, but it is incremental as it extends existing contrastive explanation techniques to a new domain.

The paper tackled the challenge of obtaining contrastive explanations for speech-to-text models by proposing the first method to analyze input spectrogram influences on alternative outputs, and demonstrated its accuracy in identifying audio features for gender assignment in speech translation.

Contrastive explanations, which indicate why an AI system produced one output (the target) instead of another (the foil), are widely regarded in explainable AI as more informative and interpretable than standard explanations. However, obtaining such explanations for speech-to-text (S2T) generative models remains an open challenge. Drawing from feature attribution techniques, we propose the first method to obtain contrastive explanations in S2T by analyzing how parts of the input spectrogram influence the choice between alternative outputs. Through a case study on gender assignment in speech translation, we show that our method accurately identifies the audio features that drive the selection of one gender over another. By extending the scope of contrastive explanations to S2T, our work provides a foundation for better understanding S2T models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes