CLAIMar 4, 2025

InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model

CMU
arXiv:2503.02969v28 citationsh-index: 6Has CodeACL
Originality Incremental advance
AI Analysis

This addresses the challenge of real-time translation for streaming speech, which is incremental as it builds on prior work by handling unbounded speech more efficiently.

The paper tackles the problem of simultaneous translation of unbounded streaming speech by proposing InfiniSST, which formulates it as a multi-turn dialogue task, resulting in a reduction of computation-aware latency by 0.5 to 1 second while maintaining translation quality compared to baselines.

Simultaneous translation of unbounded streaming speech remains a challenging problem due to the need for effectively processing the history speech context and past translations so that quality and latency, including computation overhead, can be balanced. Most prior works assume pre-segmented speech, limiting their real-world applicability. In this paper, we propose InfiniSST, a novel approach that formulates SST as a multi-turn dialogue task, enabling seamless translation of unbounded speech. We construct translation trajectories and robust segments from MuST-C with multi-latency augmentation during training and develop a key-value (KV) cache management strategy to facilitate efficient inference. Experiments on MuST-C En-Es, En-De, and En-Zh demonstrate that InfiniSST reduces computation-aware latency by 0.5 to 1 second while maintaining the same translation quality compared to baselines. Ablation studies further validate the contributions of our data construction and cache management strategy. We release the code and demo at https://github.com/LeiLiLab/InfiniSST

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes