CLJul 25, 2025

HITSZ's End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track

Xuchen Wei, Yangxin Wu, Yaoyin Zhang, Henglyu Liu, Kehai Chen, Xuefeng Bai, Min Zhang

arXiv:2507.19616v11 citationsh-index: 10IWSLT

Originality Synthesis-oriented

AI Analysis

This work addresses speech translation for low-resource Indic languages, presenting an incremental improvement by combining existing models.

The paper tackled speech-to-text translation for English-Indic language pairs in a low-resource scenario by integrating the Whisper ASR model with the Krutrim Indic LLM, achieving average BLEU scores of 28.88 for English-to-Indic and 27.86 for Indic-to-English. It also explored the Chain-of-Thought method, which showed potential for large improvements (e.g., a 13.84 BLEU increase for Tamil-to-English) but faced challenges in consistent output formatting.

This paper presents HITSZ's submission for the IWSLT 2025 Indic track, focusing on speech-to-text translation (ST) for English-to-Indic and Indic-to-English language pairs. To enhance translation quality in this low-resource scenario, we propose an end-to-end system integrating the pre-trained Whisper automated speech recognition (ASR) model with Krutrim, an Indic-specialized large language model (LLM). Experimental results demonstrate that our end-to-end system achieved average BLEU scores of $28.88$ for English-to-Indic directions and $27.86$ for Indic-to-English directions. Furthermore, we investigated the Chain-of-Thought (CoT) method. While this method showed potential for significant translation quality improvements on successfully parsed outputs (e.g. a $13.84$ BLEU increase for Tamil-to-English), we observed challenges in ensuring the model consistently adheres to the required CoT output format.

View on arXiv PDF

Similar