Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation
This addresses efficiency and accuracy problems in multilingual and speech translation systems, with incremental improvements in speed and resource usage.
The paper tackles computational redundancy and accuracy issues in multilingual translation, especially for low-resource languages, by proposing a Transformer Encoder Tree (TET) with non-autoregressive models, achieving 7-14 times faster speech translation while maintaining competitive quality.
Multilingual translation faces challenges of computational redundancy and limited accuracy for low-resource languages, especially in speech translation. To address this, we propose a novel hierarchical Transformer Encoder Tree (TET) combined with non-autoregressive encoder-only models trained with Connectionist Temporal Classification for multilingual translation. By sharing intermediate representations among linguistically similar target languages, TET can improve accuracy on low-resource languages, reduce computational redundancy, and allow generating all target languages in a single forward pass, thus eliminating sequential bottlenecks and improving parallelism. For speech translation, combining TET with a non-autoregressive speech recognition backbone (wav2vec2) shows promising results in terms of translation quality compared to autoregressive systems while being 7-14 times faster.