CLMay 27, 2025

GMU Systems for the IWSLT 2025 Low-Resource Speech Translation Shared Task

CMU
arXiv:2505.21781v12 citationsh-index: 33IWSLT
Originality Synthesis-oriented
AI Analysis

This work addresses low-resource speech translation for IWSLT participants, presenting incremental improvements through fine-tuning and initialization strategies.

The paper tackled low-resource speech translation by fine-tuning SeamlessM4T-v2 for ASR, MT, and E2E ST across multiple language pairs, finding that direct E2E fine-tuning yields strong results, ASR encoder initialization improves performance on unseen languages, and multi-task training offers slight benefits.

This paper describes the GMU systems for the IWSLT 2025 low-resource speech translation shared task. We trained systems for all language pairs, except for Levantine Arabic. We fine-tuned SeamlessM4T-v2 for automatic speech recognition (ASR), machine translation (MT), and end-to-end speech translation (E2E ST). The ASR and MT models are also used to form cascaded ST systems. Additionally, we explored various training paradigms for E2E ST fine-tuning, including direct E2E fine-tuning, multi-task training, and parameter initialization using components from fine-tuned ASR and/or MT models. Our results show that (1) direct E2E fine-tuning yields strong results; (2) initializing with a fine-tuned ASR encoder improves ST performance on languages SeamlessM4T-v2 has not been trained on; (3) multi-task training can be slightly helpful.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes