CL SD ASMar 19, 2024

MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation

Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong

arXiv:2403.12408v15.59 citationsh-index: 33

Originality Incremental advance

AI Analysis

This addresses the need for more natural and personalized speech translation systems, though it appears incremental as it builds on existing speech language model approaches.

The paper tackles the problem of multilingual speech-to-speech translation without using text data, achieving speaker style preservation in the translated speech.

There have been emerging research interest and advances in speech-to-speech translation (S2ST), translating utterances from one language to another. This work proposes Multitask Speech Language Model (MSLM), which is a decoder-only speech language model trained in a multitask setting. Without reliance on text training data, our model is able to support multilingual S2ST with speaker style preserved.

View on arXiv PDF

Similar