SDLGASOct 1, 2021

Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device

arXiv:2110.00155v18 citations
Originality Incremental advance
AI Analysis

This work addresses efficient domain adaptation for speech recognition on mobile devices, which is incremental as it builds on existing self-supervised learning methods to overcome memory constraints.

The paper tackles the problem of domain mismatch in streaming speech recognition models on mobile devices by proposing an incremental layer-wise self-supervised learning algorithm, which achieves a 24.2% better Word Error Rate on the target domain compared to a supervised baseline and reduces training memory usage by 89.7%.

Streaming end-to-end speech recognition models have been widely applied to mobile devices and show significant improvement in efficiency. These models are typically trained on the server using transcribed speech data. However, the server data distribution can be very different from the data distribution on user devices, which could affect the model performance. There are two main challenges for on device training, limited reliable labels and limited training memory. While self-supervised learning algorithms can mitigate the mismatch between domains using unlabeled data, they are not applicable on mobile devices directly because of the memory constraint. In this paper, we propose an incremental layer-wise self-supervised learning algorithm for efficient speech domain adaptation on mobile devices, in which only one layer is updated at a time. Extensive experimental results demonstrate that the proposed algorithm obtains a Word Error Rate (WER) on the target domain $24.2\%$ better than supervised baseline and costs $89.7\%$ less training memory than the end-to-end self-supervised learning algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes