Does Simultaneous Speech Translation need Simultaneous Models?
This work addresses the computational and environmental costs for researchers and practitioners in speech translation by proposing a more efficient approach, though it is incremental as it builds on existing offline techniques.
The paper tackles the problem of high computational costs in simultaneous speech translation by investigating if a single offline-trained model can perform both offline and simultaneous tasks without extra training. Experiments on English to German and Spanish show that the offline solution achieves similar or better translation quality than simultaneous-trained models and is competitive with state-of-the-art SimulST, reducing the need for multiple dedicated models.
In simultaneous speech translation (SimulST), finding the best trade-off between high translation quality and low latency is a challenging task. To meet the latency constraints posed by the different application scenarios, multiple dedicated SimulST models are usually trained and maintained, generating high computational costs. In this paper, motivated by the increased social and environmental impact caused by these costs, we investigate whether a single model trained offline can serve not only the offline but also the simultaneous task without the need for any additional training or adaptation. Experiments on en->{de, es} indicate that, aside from facilitating the adoption of well-established offline techniques and architectures without affecting latency, the offline solution achieves similar or better translation quality compared to the same model trained in simultaneous settings, as well as being competitive with the SimulST state of the art.