SDAIASDec 5, 2024

Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech

arXiv:2412.03784v15 citationsh-index: 8SLT
Originality Incremental advance
AI Analysis

This work addresses the need for objective severity evaluation in dysarthric speech, offering an incremental improvement over existing methods.

The paper tackled the problem of automatic severity classification in dysarthric speech by proposing a feature extraction method using ASR transcription, achieving a balanced accuracy of 83.72%.

Due to the subjective nature of current clinical evaluation, the need for automatic severity evaluation in dysarthric speech has emerged. DNN models outperform ML models but lack user-friendly explainability. ML models offer explainable results at a feature level, but their performance is comparatively lower. Current ML models extract various features from raw waveforms to predict severity. However, existing methods do not encompass all dysarthric features used in clinical evaluation. To address this gap, we propose a feature extraction method that minimizes information loss. We introduce an ASR transcription as a novel feature extraction source. We finetune the ASR model for dysarthric speech, then use this model to transcribe dysarthric speech and extract word segment boundary information. It enables capturing finer pronunciation and broader prosodic features. These features demonstrated an improved severity prediction performance to existing features: balanced accuracy of 83.72%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes