CLSDASMar 22, 2022

Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation

arXiv:2204.09595v315 citationsh-index: 52
Originality Incremental advance
AI Analysis

This work addresses the problem of improving adaptive policies for simultaneous speech translation, which is incremental as it builds on existing pre-decision strategies.

The paper tackles the challenge of adaptive policy design in simultaneous speech translation by adapting the Continuous Integrate-and-Fire method, achieving superior quality at low latency and better generalization to long utterances compared to monotonic multihead attention.

Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech before the complete input is observed. A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write. While recent works had proposed various strategies to improve the pre-decision, they mainly adopt the fixed wait-k policy, leaving the adaptive policies rarely explored. This paper proposes to model the adaptive policy by adapting the Continuous Integrate-and-Fire (CIF). Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances. We conduct experiments on the MuST-C V2 dataset and show the effectiveness of our approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes