LGCLAug 27, 2025

Heterogeneous Self-Supervised Acoustic Pre-Training with Local Constraints

arXiv:2508.19990v2h-index: 5
Originality Incremental advance
AI Analysis

This work addresses the challenge of handling multi-domain and multilingual data in speech recognition, offering an incremental improvement over conventional methods.

The paper tackles the problem of self-supervised pre-training with heterogeneous data by proposing a bilevel optimization approach with local constraints, which significantly improves model adaptivity for downstream tasks.

Self-supervised pre-training using unlabeled data is widely used in automatic speech recognition. In this paper, we propose a new self-supervised pre-training approach to dealing with heterogeneous data. Instead of mixing all the data and minimizing the averaged global loss in the conventional way, we impose additional local constraints to ensure that the model optimizes each source of heterogeneous data to its local optimum after $K$-step gradient descent initialized from the model. We formulate this as a bilevel optimization problem, and use the first-order approximation method to solve the problem. We discuss its connection to model-agnostic meta learning. Experiments are carried out on self-supervised pre-training using multi-domain and multilingual datasets, demonstrating that the proposed approach can significantly improve the adaptivity of the self-supervised pre-trained model for the downstream supervised fine-tuning tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes