ASCLLGSDJun 17, 2024

Self-Train Before You Transcribe

arXiv:2406.12937v11 citations
Originality Incremental advance
AI Analysis

This addresses domain adaptation for speech recognition systems in scenarios where collecting separate unlabelled target domain data is impractical, offering an incremental improvement over existing self-training methods.

The paper tackles the problem of speech recognition performance degradation due to domain mismatch by proposing a test-time adaptation method using noisy student teacher training on test set recordings, achieving relative gains of up to 32.2%.

When there is a mismatch between the training and test domains, current speech recognition systems show significant performance degradation. Self-training methods, such as noisy student teacher training, can help address this and enable the adaptation of models under such domain shifts. However, self-training typically requires a collection of unlabelled target domain data. For settings where this is not practical, we investigate the benefit of performing noisy student teacher training on recordings in the test set as a test-time adaptation approach. Similarly to the dynamic evaluation approach in language modelling, this enables the transfer of information across utterance boundaries and functions as a method of domain adaptation. A range of in-domain and out-of-domain datasets are used for experiments demonstrating large relative gains of up to 32.2%. Interestingly, our method showed larger gains than the typical self-training setup that utilises separate adaptation data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes