CLSDASJun 2, 2025

Continual Speech Learning with Fused Speech Features

arXiv:2506.01496v22 citationsh-index: 44INTERSPEECH
Originality Incremental advance
AI Analysis

This addresses the adaptation gap in speech processing for applications requiring flexible models, though it is incremental as it builds on existing Whisper architecture.

The paper tackles the problem of adapting speech models to dynamic and diverse data by introducing continual speech learning, which uses a learnable gated-fusion layer with Whisper to improve accuracy across six speech tasks without full retraining.

Rapid growth in speech data demands adaptive models, as traditional static methods fail to keep pace with dynamic and diverse speech information. We introduce continuous speech learning, a new set-up targeting at bridging the adaptation gap in current speech models. We use the encoder-decoder Whisper model to standardize speech tasks into a generative format. We integrate a learnable gated-fusion layer on the top of the encoder to dynamically select task-specific features for downstream tasks. Our approach improves accuracy significantly over traditional methods in six speech processing tasks, demonstrating gains in adapting to new speech tasks without full retraining.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes