CLNov 2, 2020

Enabling Zero-shot Multilingual Spoken Language Translation with Language-Specific Encoders and Decoders

arXiv:2011.01097v218 citations
AI Analysis

This work addresses the data scarcity issue in multilingual spoken language translation, enabling zero-shot translation for multiple languages, though it is incremental as it builds on existing multilingual NMT methods.

The paper tackles the problem of limited training data for multilingual spoken language translation by extending a multilingual neural machine translation architecture with language-specific encoders and decoders, achieving similar translation quality to a bilingual baseline (±0.2 BLEU) and enabling zero-shot translation without multilingual SLT data. It also introduces an Adapter module that improves performance by up to +6 BLEU points on the proposed architecture.

Current end-to-end approaches to Spoken Language Translation (SLT) rely on limited training resources, especially for multilingual settings. On the other hand, Multilingual Neural Machine Translation (MultiNMT) approaches rely on higher-quality and more massive data sets. Our proposed method extends a MultiNMT architecture based on language-specific encoders-decoders to the task of Multilingual SLT (MultiSLT). Our method entirely eliminates the dependency from MultiSLT data and it is able to translate while training only on ASR and MultiNMT data. Our experiments on four different languages show that coupling the speech encoder to the MultiNMT architecture produces similar quality translations compared to a bilingual baseline ($\pm 0.2$ BLEU) while effectively allowing for zero-shot MultiSLT. Additionally, we propose using an Adapter module for coupling the speech inputs. This Adapter module produces consistent improvements up to +6 BLEU points on the proposed architecture and +1 BLEU point on the end-to-end baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes