SDCLASJul 26, 2024

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation

arXiv:2407.18461v111 citationsh-index: 10Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of high variability in dysarthric speech for disabled users, offering a more convenient and cost-effective solution compared to traditional adaptation methods.

The paper tackles the challenge of dysarthric speech recognition for unseen speakers by introducing a prototype-based adaptation method that avoids costly fine-tuning, achieving improved performance without additional speaker-specific data.

Dysarthric speech recognition (DSR) presents a formidable challenge due to inherent inter-speaker variability, leading to severe performance degradation when applying DSR models to new dysarthric speakers. Traditional speaker adaptation methodologies typically involve fine-tuning models for each speaker, but this strategy is cost-prohibitive and inconvenient for disabled users, requiring substantial data collection. To address this issue, we introduce a prototype-based approach that markedly improves DSR performance for unseen dysarthric speakers without additional fine-tuning. Our method employs a feature extractor trained with HuBERT to produce per-word prototypes that encapsulate the characteristics of previously unseen speakers. These prototypes serve as the basis for classification. Additionally, we incorporate supervised contrastive learning to refine feature extraction. By enhancing representation quality, we further improve DSR performance, enabling effective personalized DSR. We release our code at https://github.com/NKU-HLT/PB-DSR.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes