An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies
This work addresses latency-quality trade-offs for real-time speech translation systems, but it is incremental as it builds on existing end-to-end models with a new decoding strategy.
The paper tackles the problem of controlling the trade-off between translation quality (BLEU) and latency (Average Lagging) in end-to-end simultaneous speech translation, achieving results comparable to a strong cascade model on the IWSLT 2020 shared task.
This paper proposes a decoding strategy for end-to-end simultaneous speech translation. We leverage end-to-end models trained in offline mode and conduct an empirical study for two language pairs (English-to-German and English-to-Portuguese). We also investigate different output token granularities including characters and Byte Pair Encoding (BPE) units. The results show that the proposed decoding approach allows to control BLEU/Average Lagging trade-off along different latency regimes. Our best decoding settings achieve comparable results with a strong cascade model evaluated on the simultaneous translation track of IWSLT 2020 shared task.