CL SD ASOct 28, 2022

Efficient Speech Translation with Dynamic Latent Perceivers

Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà

arXiv:2210.16264v20.84 citationsh-index: 26Has Code

Originality Incremental advance

AI Analysis

This work addresses the computational bottleneck in Speech Translation for researchers and practitioners, offering an incremental improvement in efficiency without sacrificing quality.

The paper tackles the computational inefficiency of Transformers in Speech Translation by proposing a Perceiver encoder with Dynamic Latent Access (DLA), which matches Transformer performance across three language pairs in MuST-C while enabling flexible deployment with various computational budgets.

Transformers have been the dominant architecture for Speech Translation in recent years, achieving significant improvements in translation quality. Since speech signals are longer than their textual counterparts, and due to the quadratic complexity of the Transformer, a down-sampling step is essential for its adoption in Speech Translation. Instead, in this research, we propose to ease the complexity by using a Perceiver encoder to map the speech inputs to a fixed-length latent representation. Furthermore, we introduce a novel way of training Perceivers, with Dynamic Latent Access (DLA), unlocking larger latent spaces without any additional computational overhead. Speech-to-Text Perceivers with DLA can match the performance of Transformer baselines across three language pairs in MuST-C. Finally, a DLA-trained model is easily adaptable to DLA at inference, and can be flexibly deployed with various computational budgets, without significant drops in translation quality.

View on arXiv PDF Code

Similar