CLSDASMay 30, 2025

MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR

arXiv:2505.24656v21 citationsh-index: 43INTERSPEECH
Originality Incremental advance
AI Analysis

This addresses domain adaptation challenges in ASR, particularly for low-resource languages like Greek, but is incremental as it builds on existing semi-supervised and self-supervised techniques.

The paper tackled unsupervised domain adaptation for automatic speech recognition by introducing a multi-stage pipeline combining pseudo-labeling and self-supervision, achieving state-of-the-art results with significant performance improvements over existing methods.

In this work, we investigate the Meta PL unsupervised domain adaptation framework for Automatic Speech Recognition (ASR). We introduce a Multi-Stage Domain Adaptation pipeline (MSDA), a sample-efficient, two-stage adaptation approach that integrates self-supervised learning with semi-supervised techniques. MSDA is designed to enhance the robustness and generalization of ASR models, making them more adaptable to diverse conditions. It is particularly effective for low-resource languages like Greek and in weakly supervised scenarios where labeled data is scarce or noisy. Through extensive experiments, we demonstrate that Meta PL can be applied effectively to ASR tasks, achieving state-of-the-art results, significantly outperforming state-of-the-art methods, and providing more robust solutions for unsupervised domain adaptation in ASR. Our ablations highlight the necessity of utilizing a cascading approach when combining self-supervision with self-training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes