AS CL LG SDJun 9, 2023

A Theory of Unsupervised Speech Recognition

Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo

arXiv:2306.07926v136.9224 citationsh-index: 44Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of training instability and hyperparameter sensitivity in ASR-U for researchers, but it is incremental as it builds on existing algorithms without introducing a new method.

The paper tackles the lack of a theoretical framework for unsupervised speech recognition (ASR-U) by proposing one based on random matrix theory and neural tangent kernels, proving learnability conditions and sample complexity bounds, with experiments on synthetic languages providing empirical support.

Unsupervised speech recognition (ASR-U) is the problem of learning automatic speech recognition (ASR) systems from unpaired speech-only and text-only corpora. While various algorithms exist to solve this problem, a theoretical framework is missing from studying their properties and addressing such issues as sensitivity to hyperparameters and training instability. In this paper, we proposed a general theoretical framework to study the properties of ASR-U systems based on random matrix theory and the theory of neural tangent kernels. Such a framework allows us to prove various learnability conditions and sample complexity bounds of ASR-U. Extensive ASR-U experiments on synthetic languages with three classes of transition graphs provide strong empirical evidence for our theory (code available at cactuswiththoughts/UnsupASRTheory.git).

View on arXiv PDF Code

Similar