CLDec 19, 2025

Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

arXiv:2512.17648v14 citationsh-index: 34Has Code
Originality Synthesis-oriented
AI Analysis

This provides a unified tool for researchers and developers working on streaming speech-to-text translation, though it is incremental as it builds upon existing evaluation needs.

The authors tackled the lack of a maintained and comprehensive framework for evaluating and demonstrating streaming speech-to-text translation systems, which require low latency and high quality, by introducing Simulstream, an open-source toolkit that supports long-form audio, incremental decoding, re-translation methods, and includes an interactive web interface.

Streaming Speech-to-Text Translation (StreamST) requires producing translations concurrently with incoming speech, imposing strict latency constraints and demanding models that balance partial-information decision-making with high translation quality. Research efforts on the topic have so far relied on the SimulEval repository, which is no longer maintained and does not support systems that revise their outputs. In addition, it has been designed for simulating the processing of short segments, rather than long-form audio streams, and it does not provide an easy method to showcase systems in a demo. As a solution, we introduce simulstream, the first open-source framework dedicated to unified evaluation and demonstration of StreamST systems. Designed for long-form speech processing, it supports not only incremental decoding approaches, but also re-translation methods, enabling for their comparison within the same framework both in terms of quality and latency. In addition, it also offers an interactive web interface to demo any system built within the tool.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes