CLAug 19, 2023

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

arXiv:2308.10107v11 citationsh-index: 28
Originality Incremental advance
AI Analysis

This work addresses the need for more efficient and lower-latency speech recognition systems, offering a method to control alignments for practical advantages, though it is incremental as it builds on existing transducer frameworks.

The paper tackles the problem of uncontrollable alignment prediction in transducer-based automatic speech recognition by proposing Bayes Risk Transducer (BRT), which enforces preferred paths to achieve controllable alignments, resulting in up to 46% inference cost savings for non-streaming ASR and 41% latency reduction for streaming ASR.

Automatic speech recognition (ASR) based on transducers is widely used. In training, a transducer maximizes the summed posteriors of all paths. The path with the highest posterior is commonly defined as the predicted alignment between the speech and the transcription. While the vanilla transducer does not have a prior preference for any of the valid paths, this work intends to enforce the preferred paths and achieve controllable alignment prediction. Specifically, this work proposes Bayes Risk Transducer (BRT), which uses a Bayes risk function to set lower risk values to the preferred paths so that the predicted alignment is more likely to satisfy specific desired properties. We further demonstrate that these predicted alignments with intentionally designed properties can provide practical advantages over the vanilla transducer. Experimentally, the proposed BRT saves inference cost by up to 46% for non-streaming ASR and reduces overall system latency by 41% for streaming ASR.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes