CL SD ASFeb 2, 2024

Streaming Sequence Transduction through Dynamic Compression

Weiting Tan, Yunmo Chen, Tongfei Chen, Guanghui Qin, Haoran Xu, Heidi C. Zhang, Benjamin Van Durme, Philipp Koehn

Microsoft

arXiv:2402.01172v33.42 citationsh-index: 60Has CodeIWSLT

Originality Incremental advance

AI Analysis

This addresses the challenge of optimizing latency, memory, and quality for streaming tasks like speech-to-text, though it appears incremental as it builds on Transformer-based methods.

The paper tackles the problem of efficient sequence-to-sequence transduction over streams by introducing STAR, a Transformer-based model that dynamically segments input streams to create compressed anchor representations, achieving nearly lossless compression (12x) in Automatic Speech Recognition and outperforming existing methods.

We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams. STAR dynamically segments input streams to create compressed anchor representations, achieving nearly lossless compression (12x) in Automatic Speech Recognition (ASR) and outperforming existing methods. Moreover, STAR demonstrates superior segmentation and latency-quality trade-offs in simultaneous speech-to-text tasks, optimizing latency, memory footprint, and quality.

View on arXiv PDF Code

Similar