CLJun 20, 2025

Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025

arXiv:2506.17077v17 citationsh-index: 9IWSLT
Originality Incremental advance
AI Analysis

This work addresses incremental improvements in simultaneous translation for multiple language pairs, potentially benefiting real-time communication applications.

The paper tackles simultaneous speech translation by using Whisper with AlignAtt policy and EuroLLM, achieving improvements of 2 BLEU points on Czech to English and 13-22 BLEU points on other language pairs compared to a baseline.

This paper describes Charles University submission to the Simultaneous Speech Translation Task of the IWSLT 2025. We cover all four language pairs with a direct or cascade approach. The backbone of our systems is the offline Whisper speech model, which we use for both translation and transcription in simultaneous mode with the state-of-the-art simultaneous policy AlignAtt. We further improve the performance by prompting to inject in-domain terminology, and we accommodate context. Our cascaded systems further use EuroLLM for unbounded simultaneous translation. Compared to the Organizers' baseline, our systems improve by 2 BLEU points on Czech to English and 13-22 BLEU points on English to German, Chinese and Japanese on the development sets. Additionally, we also propose a new enhanced measure of speech recognition latency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes