CL ASOct 1, 2019

Multilingual End-to-End Speech Translation

Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

arXiv:1910.00254v26.0100 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the challenge of speech translation for multiple languages, offering a novel approach that could improve efficiency and performance in multilingual communication systems.

The paper tackles the problem of multilingual end-to-end speech translation by proposing a universal sequence-to-sequence framework that directly translates speech in source languages to target languages, showing that multilingual models significantly outperform bilingual ones in one-to-many and many-to-many scenarios.

In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-to-end ST problem. We show the effectiveness of multilingual end-to-end ST in two scenarios: one-to-many and many-to-many translations with publicly available data. We experimentally confirm that multilingual end-to-end ST models significantly outperform bilingual ones in both scenarios. The generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair. All of our codes and the database are publicly available to encourage further research in this emergent multilingual ST topic.

View on arXiv PDF Code

Similar