SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation
This work addresses the integration of simultaneous text and speech translation for real-time applications, but it appears incremental as it adapts existing methods rather than introducing a fundamentally new approach.
The paper tackles the problem of adapting simultaneous text translation methods to end-to-end simultaneous speech translation by introducing a pre-decision module, analyzing latency-quality trade-offs, and designing a novel computation-aware latency metric, with results showing detailed analysis but no specific numerical gains reported.
Simultaneous text translation and end-to-end speech translation have recently made great progress but little work has combined these tasks together. We investigate how to adapt simultaneous text translation methods such as wait-k and monotonic multihead attention to end-to-end simultaneous speech translation by introducing a pre-decision module. A detailed analysis is provided on the latency-quality trade-offs of combining fixed and flexible pre-decision with fixed and flexible policies. We also design a novel computation-aware latency metric, adapted from Average Lagging.