CLSep 26, 2025

SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation

Haotian Tan, Hiroki Ouchi, Sakriani Sakti

arXiv:2509.21932v12.7h-index: 36

Originality Incremental advance

AI Analysis

This addresses the challenge of real-time efficiency in simultaneous speech translation systems, offering a novel approach to reduce latency and computational cost, though it is incremental in improving existing methods.

The paper tackled the problem of making efficient read/write decisions for simultaneous speech translation by proposing SimulSense, which mimics human interpreters to trigger translations based on sense units, resulting in a superior quality-latency tradeoff and decision-making up to 9.6x faster than baselines.

How to make human-interpreter-like read/write decisions for simultaneous speech translation (SimulST) systems? Current state-of-the-art systems formulate SimulST as a multi-turn dialogue task, requiring specialized interleaved training data and relying on computationally expensive large language model (LLM) inference for decision-making. In this paper, we propose SimulSense, a novel framework for SimulST that mimics human interpreters by continuously reading input speech and triggering write decisions to produce translation when a new sense unit is perceived. Experiments against two state-of-the-art baseline systems demonstrate that our proposed method achieves a superior quality-latency tradeoff and substantially improved real-time efficiency, where its decision-making is up to 9.6x faster than the baselines.

View on arXiv PDF

Similar