CLSDASFeb 2, 2024

Streaming Sequence Transduction through Dynamic Compression

Microsoft
arXiv:2402.01172v32 citationsh-index: 60IWSLT
Originality Incremental advance
AI Analysis

This addresses the challenge of optimizing latency, memory, and quality for streaming tasks like speech-to-text, though it appears incremental as it builds on Transformer-based methods.

The paper tackles the problem of efficient sequence-to-sequence transduction over streams by introducing STAR, a Transformer-based model that dynamically segments input streams to create compressed anchor representations, achieving nearly lossless compression (12x) in Automatic Speech Recognition and outperforming existing methods.

We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams. STAR dynamically segments input streams to create compressed anchor representations, achieving nearly lossless compression (12x) in Automatic Speech Recognition (ASR) and outperforming existing methods. Moreover, STAR demonstrates superior segmentation and latency-quality trade-offs in simultaneous speech-to-text tasks, optimizing latency, memory footprint, and quality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes