SDAIASAug 17, 2025

HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization

arXiv:2508.12292v11 citationsh-index: 5INTERSPEECH
Originality Incremental advance
AI Analysis

This addresses the problem of performance degradation in automatic speech recognition under noisy conditions for users of speech foundation models, representing an incremental improvement.

The paper tackled noise robustness in speech foundation models by proposing HuBERT-VIC, which uses variance-invariance-covariance regularization to improve generalization to noisy speech, achieving relative performance improvements of 23.3% on LibriSpeech test-clean and 13.2% on test-other compared to a baseline.

Noise robustness in speech foundation models (SFMs) has been a critical challenge, as most models are primarily trained on clean data and experience performance degradation when the models are exposed to noisy speech. To address this issue, we propose HuBERT-VIC, a noise-robust SFM with variance, in-variance, and covariance regularization (VICReg) objectives. These objectives adjust the statistics of noisy speech representations, enabling the model to capture diverse acoustic characteristics and improving the generalization ability across different types of noise. When applied to HuBERT, our model shows relative performance improvements of 23.3% on LibriSpeech test-clean and 13.2% on test-other, compared to the baseline model pre-trained on noisy speech.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes