Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025
This work addresses incremental improvements in simultaneous translation for multiple language pairs, potentially benefiting real-time communication applications.
The paper tackles simultaneous speech translation by using Whisper with AlignAtt policy and EuroLLM, achieving improvements of 2 BLEU points on Czech to English and 13-22 BLEU points on other language pairs compared to a baseline.
This paper describes Charles University submission to the Simultaneous Speech Translation Task of the IWSLT 2025. We cover all four language pairs with a direct or cascade approach. The backbone of our systems is the offline Whisper speech model, which we use for both translation and transcription in simultaneous mode with the state-of-the-art simultaneous policy AlignAtt. We further improve the performance by prompting to inject in-domain terminology, and we accommodate context. Our cascaded systems further use EuroLLM for unbounded simultaneous translation. Compared to the Organizers' baseline, our systems improve by 2 BLEU points on Czech to English and 13-22 BLEU points on English to German, Chinese and Japanese on the development sets. Additionally, we also propose a new enhanced measure of speech recognition latency.