SDASMar 16, 2021

An Asynchronous WFST-Based Decoder For Automatic Speech Recognition

arXiv:2103.09063v13 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in automatic speech recognition systems, offering incremental improvements for faster decoding in practical applications.

The paper tackles the problem of computational overhead in one-pass decoding for large vocabulary continuous speech recognition by introducing an asynchronous dynamic decoder with a novel two-front design, resulting in notably faster decoding speeds, especially with increasing data complexity.

We introduce asynchronous dynamic decoder, which adopts an efficient A* algorithm to incorporate big language models in the one-pass decoding for large vocabulary continuous speech recognition. Unlike standard one-pass decoding with on-the-fly composition decoder which might induce a significant computation overhead, the asynchronous dynamic decoder has a novel design where it has two fronts, with one performing "exploration" and the other "backfill". The computation of the two fronts alternates in the decoding process, resulting in more effective pruning than the standard one-pass decoding with an on-the-fly composition decoder. Experiments show that the proposed decoder works notably faster than the standard one-pass decoding with on-the-fly composition decoder, while the acceleration will be more obvious with the increment of data complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes