CL SD ASJun 12, 2024

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

Peidong Wang, Jian Xue, Jinyu Li, Junkun Chen, Aswin Shanmugam Subramanian

arXiv:2406.10276v11.0

Originality Incremental advance

AI Analysis

This work addresses the need for flexible speech translation systems that can leverage optional language information to boost performance for particular languages, offering an incremental improvement for users of multilingual translation tools.

The paper tackles the problem of enhancing specific languages in language-agnostic many-to-one speech translation models without degrading performance on other languages, achieving this by introducing a linear input network that improves the specified language while maintaining overall translation quality.

Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In some cases, the input language can be given or estimated. Our goal is to use this additional language information while preserving the quality of the other languages. We accomplish this by introducing a simple and effective linear input network. The linear input network is initialized as an identity matrix, which ensures that the model can perform as well as, or better than, the original model. Experimental results show that the proposed method can successfully enhance the specified language, while keeping the language-agnostic ability of the many-to-one ST models.

View on arXiv PDF

Similar