CLDec 8, 2023

Partial Rewriting for Multi-Stage ASR

arXiv:2312.09463v1h-index: 12
Originality Incremental advance
AI Analysis

This work addresses a specific issue in streaming ASR for real-time applications, offering an incremental improvement.

The paper tackles the problem of improving streaming results in multi-stage automatic speech recognition without compromising latency, achieving around a 10% quality improvement in streaming results without altering final results.

For many streaming automatic speech recognition tasks, it is important to provide timely intermediate streaming results, while refining a high quality final result. This can be done using a multi-stage architecture, where a small left-context only model creates streaming results and a larger left- and right-context model produces a final result at the end. While this significantly improves the quality of the final results without compromising the streaming emission latency of the system, streaming results do not benefit from the quality improvements. Here, we propose using a text manipulation algorithm that merges the streaming outputs of both models. We improve the quality of streaming results by around 10%, without altering the final results. Our approach introduces no additional latency and reduces flickering. It is also lightweight, does not require retraining the model, and it can be applied to a wide variety of multi-stage architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes