CVDec 2, 2024

Unveiling Interpretability in Self-Supervised Speech Representations for Parkinson's Diagnosis

David Gimeno-Gómez, Catarina Botelho, Anna Pompili, Alberto Abad, Carlos-D. Martínez-Hinarejos

arXiv:2412.02006v29.620 citationsh-index: 12Has CodeIEEE J Sel Top Signal Process

Originality Incremental advance

AI Analysis

This addresses the problem of clinical adoption for computer-assisted diagnosis in pathological speech analysis, though it is incremental in enhancing interpretability.

The paper tackled the lack of interpretability in self-supervised speech representations for Parkinson's Disease diagnosis by proposing a novel framework with cross-attention mechanisms, achieving competitive classification accuracy and robustness in cross-lingual scenarios.

Recent works in pathological speech analysis have increasingly relied on powerful self-supervised speech representations, leading to promising results. However, the complex, black-box nature of these embeddings and the limited research on their interpretability significantly restrict their adoption for clinical diagnosis. To address this gap, we propose a novel, interpretable framework specifically designed to support Parkinson's Disease (PD) diagnosis. Through the design of simple yet effective cross-attention mechanisms for both embedding- and temporal-level analysis, the proposed framework offers interpretability from two distinct but complementary perspectives. Experimental findings across five well-established speech benchmarks for PD detection demonstrate the framework's capability to identify meaningful speech patterns within self-supervised representations for a wide range of assessment tasks. Fine-grained temporal analyses further underscore its potential to enhance the interpretability of deep-learning pathological speech models, paving the way for the development of more transparent, trustworthy, and clinically applicable computer-assisted diagnosis systems in this domain. Moreover, in terms of classification accuracy, our method achieves results competitive with state-of-the-art approaches, while also demonstrating robustness in cross-lingual scenarios when applied to spontaneous speech production.

View on arXiv PDF Code

Similar