AS CL LG SDJan 16, 2023

BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition

arXiv:2301.11276v1

Originality Incremental advance

AI Analysis

This work addresses the need for more efficient and accurate speech recognition models, though it appears incremental as it builds on existing transformer architectures with a Bayesian approach.

The paper tackled the problem of end-to-end automatic speech recognition by proposing BayesSpeech, a Bayesian Transformer Network that uses variational inference and the local reparameterization trick to learn intractable weight posteriors, resulting in faster training time and near state-of-the-art performance on LibriSpeech-960.

Recent developments using End-to-End Deep Learning models have been shown to have near or better performance than state of the art Recurrent Neural Networks (RNNs) on Automatic Speech Recognition tasks. These models tend to be lighter weight and require less training time than traditional RNN-based approaches. However, these models take frequentist approach to weight training. In theory, network weights are drawn from a latent, intractable probability distribution. We introduce BayesSpeech for end-to-end Automatic Speech Recognition. BayesSpeech is a Bayesian Transformer Network where these intractable posteriors are learned through variational inference and the local reparameterization trick without recurrence. We show how the introduction of variance in the weights leads to faster training time and near state-of-the-art performance on LibriSpeech-960.

View on arXiv PDF

Similar