CLDec 15, 2022

Attention as a Guide for Simultaneous Speech Translation

arXiv:2212.07850v2239 citationsh-index: 47
Originality Incremental advance
AI Analysis

This work addresses real-time speech translation for applications requiring low latency, though it is incremental as it builds on existing attention mechanisms.

The paper tackles the problem of simultaneous speech translation by proposing an attention-based policy (EDAtt) that uses encoder-decoder attention scores to guide real-time inference, achieving overall better results than the state of the art, particularly in computational-aware latency, on English to German and Spanish tasks.

The study of the attention mechanism has sparked interest in many fields, such as language modeling and machine translation. Although its patterns have been exploited to perform different tasks, from neural network understanding to textual alignment, no previous work has analysed the encoder-decoder attention behavior in speech translation (ST) nor used it to improve ST on a specific task. In this paper, we fill this gap by proposing an attention-based policy (EDAtt) for simultaneous ST (SimulST) that is motivated by an analysis of the existing attention relations between audio input and textual output. Its goal is to leverage the encoder-decoder attention scores to guide inference in real time. Results on en->{de, es} show that the EDAtt policy achieves overall better results compared to the SimulST state of the art, especially in terms of computational-aware latency.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes