CLJan 23

PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs

arXiv:2601.16618v1h-index: 2
Originality Incremental advance
AI Analysis

This work addresses the underexplored application of LLMs to S2ST, a domain-specific problem for speech translation, and is incremental as it builds on existing methods with novel techniques.

The paper tackles the problem of data scarcity in applying Large Language Models (LLMs) to Speech-to-Speech Translation (S2ST) by proposing PROST-LLM, which progressively enhances S2ST capabilities through fine-tuning with tri-task learning, self-sampling, and preference optimization, resulting in improved performance as confirmed by experiments.

Although Large Language Models (LLMs) excel in many tasks, their application to Speech-to-Speech Translation (S2ST) is underexplored and hindered by data scarcity. To bridge this gap, we propose PROST-LLM (PROgressive Speech-to-speech Translation) to enhance the S2ST capabilities in LLMs progressively. First, we fine-tune the LLMs with the CVSS corpus, employing designed tri-task learning and chain of modality methods to boost the initial performance. Then, leveraging the fine-tuned model, we generate preference pairs through self-sampling and back-translation without human evaluation. Finally, these preference pairs are used for preference optimization to enhance the model's S2ST capability further. Extensive experiments confirm the effectiveness of our proposed PROST-LLM in improving the S2ST capability of LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes