CL LG MLJan 13, 2024

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

A F M Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, Tianyi Chen

arXiv:2401.06980v14.28 citationsh-index: 12ICASSP

Originality Incremental advance

AI Analysis

This addresses the challenge of improving ASR accuracy with limited labeled data, though it appears incremental as it builds on existing bilevel optimization methods.

The paper tackles the problem of training acoustic models for automatic speech recognition by proposing a bilevel optimization approach that jointly uses unsupervised and supervised losses, achieving superior performance over pre-training and fine-tuning on LibriSpeech and TED-LIUM v2 datasets.

In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an unsupervised loss and a supervised loss respectively, leveraging recent advances in penalty-based bilevel optimization to solve this challenging ASR problem with affordable complexity and rigorous convergence guarantees.} To evaluate BL-JUST, extensive experiments on the LibriSpeech and TED-LIUM v2 datasets have been conducted. BL-JUST achieves superior performance over the commonly used pre-training followed by fine-tuning strategy.

View on arXiv PDF

Similar