CLJun 2

A Pocket Offline Model for Simultaneous Speech Translation as CUNI Submission to IWSLT 2026

arXiv:2606.0394838.5h-index: 29
Predicted impact top 17% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers and practitioners in simultaneous speech translation, this work demonstrates that a compact offline model can be effectively adapted for real-time use, offering a practical alternative to larger systems.

The authors adapted an offline speech translation model (Canary) for simultaneous translation using the AlignAtt policy, achieving high quality and low latency with a 1B-parameter model supporting 25 languages. Their system outperformed similarly sized baselines in both low- and high-latency regimes.

We implement simultaneous translation capability with the offline direct speech-to-text translation model Canary, using the state-of-the-art policy AlignAtt, and submit it to IWSLT 2026 Simultaneous Speech Translation Shared task for Czech to English and English to German and Italian. The strengths of our system are: (1) high translation quality, outperforming similarly sized baselines both in low- and high-latency regimes in computationally unaware simulations; (2) low computational requirements, as the model has only 1B parameters; (3) multilinguality -- support of 25 source and 25 target languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes