CLSDASMar 19, 2024

MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation

arXiv:2403.12408v19 citations
Originality Incremental advance
AI Analysis

This addresses the need for more natural and personalized speech translation systems, though it appears incremental as it builds on existing speech language model approaches.

The paper tackles the problem of multilingual speech-to-speech translation without using text data, achieving speaker style preservation in the translated speech.

There have been emerging research interest and advances in speech-to-speech translation (S2ST), translating utterances from one language to another. This work proposes Multitask Speech Language Model (MSLM), which is a decoder-only speech language model trained in a multitask setting. Without reliance on text training data, our model is able to support multilingual S2ST with speaker style preserved.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes