CLMar 4, 2021

An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies

arXiv:2103.03233v120 citations
Originality Synthesis-oriented
AI Analysis

This work addresses latency-quality trade-offs for real-time speech translation systems, but it is incremental as it builds on existing end-to-end models with a new decoding strategy.

The paper tackles the problem of controlling the trade-off between translation quality (BLEU) and latency (Average Lagging) in end-to-end simultaneous speech translation, achieving results comparable to a strong cascade model on the IWSLT 2020 shared task.

This paper proposes a decoding strategy for end-to-end simultaneous speech translation. We leverage end-to-end models trained in offline mode and conduct an empirical study for two language pairs (English-to-German and English-to-Portuguese). We also investigate different output token granularities including characters and Byte Pair Encoding (BPE) units. The results show that the proposed decoding approach allows to control BLEU/Average Lagging trade-off along different latency regimes. Our best decoding settings achieve comparable results with a strong cascade model evaluated on the simultaneous translation track of IWSLT 2020 shared task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes