CLSDASJan 14, 2025

Selective Attention Merging for low resource tasks: A case study of Child ASR

arXiv:2501.08468v17 citationsh-index: 10ICASSP
Originality Incremental advance
AI Analysis

This work addresses the challenge of low-resource ASR for child speech, which is incremental as it builds on existing model merging and data augmentation techniques.

The paper tackled the problem of poor performance of Speech Foundation Models on low-resource child Automatic Speech Recognition by introducing Selective Attention Merge, a model merging technique that achieved up to 14% relative reduction in word error rate and a new state-of-the-art WER of 8.69 on the MyST database.

While Speech Foundation Models (SFMs) excel in various speech tasks, their performance for low-resource tasks such as child Automatic Speech Recognition (ASR) is hampered by limited pretraining data. To address this, we explore different model merging techniques to leverage knowledge from models trained on larger, more diverse speech corpora. This paper also introduces Selective Attention (SA) Merge, a novel method that selectively merges task vectors from attention matrices to enhance SFM performance on low-resource tasks. Experiments on the MyST database show significant reductions in relative word error rate of up to 14%, outperforming existing model merging and data augmentation techniques. By combining data augmentation techniques with SA Merge, we achieve a new state-of-the-art WER of 8.69 on the MyST database for the Whisper-small model, highlighting the potential of SA Merge for improving low-resource ASR.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes