CLSDASJun 12, 2024

PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

arXiv:2406.07823v123 citations
Originality Incremental advance
AI Analysis

This work addresses latency issues in voice assistants, offering a practical improvement for real-time applications, though it is incremental as it builds on existing deliberation approaches.

The paper tackled the high latency of autoregressive deliberation models in spoken language understanding by introducing PRoDeliberation, a non-autoregressive method that achieves 2-10x latency reduction while maintaining robustness to correct ASR mistranscriptions.

Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a novel method leveraging a Connectionist Temporal Classification-based decoding strategy as well as a denoising objective to train robust non-autoregressive deliberation models. We show that PRoDeliberation achieves the latency reduction of parallel decoding (2-10x improvement over autoregressive models) while retaining the ability to correct Automatic Speech Recognition (ASR) mistranscriptions of autoregressive deliberation systems. We further show that the design of the denoising training allows PRoDeliberation to overcome the limitations of small ASR devices, and we provide analysis on the necessity of each component of the system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes