CLLGMLJan 13, 2024

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

arXiv:2401.06980v18 citationsh-index: 12ICASSP
Originality Incremental advance
AI Analysis

This addresses the challenge of improving ASR accuracy with limited labeled data, though it appears incremental as it builds on existing bilevel optimization methods.

The paper tackles the problem of training acoustic models for automatic speech recognition by proposing a bilevel optimization approach that jointly uses unsupervised and supervised losses, achieving superior performance over pre-training and fine-tuning on LibriSpeech and TED-LIUM v2 datasets.

In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an unsupervised loss and a supervised loss respectively, leveraging recent advances in penalty-based bilevel optimization to solve this challenging ASR problem with affordable complexity and rigorous convergence guarantees.} To evaluate BL-JUST, extensive experiments on the LibriSpeech and TED-LIUM v2 datasets have been conducted. BL-JUST achieves superior performance over the commonly used pre-training followed by fine-tuning strategy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes