AS AI CLAug 15, 2025

Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge

Xiaoxiao Li, An Zhu, Youhai Jiang, Fengjie Zhu

arXiv:2508.14916v11.21 citationsh-index: 1Workshop on Multilingual Conversational Speech Language Model (MLC-SLM)

Originality Synthesis-oriented

AI Analysis

This work addresses multilingual ASR for participants in a specific challenge, representing an incremental improvement through hybrid methods.

The paper tackled multilingual speech recognition for the MLC-SLM 2025 Challenge by combining a frozen Whisper encoder, a trainable adaptor, and a frozen Qwen2.5 LLM with LoRA, achieving a WER/CER of 9.83% across 11 languages and ranking third.

This paper presents the architecture and performance of a novel Multilingual Automatic Speech Recognition (ASR) system developed by the Transsion Speech Team for Track 1 of the MLC-SLM 2025 Challenge. The proposed system comprises three key components: 1) a frozen Whisper-large-v3 based speech encoder, leveraging large-scale pretraining to ensure robust acoustic feature extraction; 2) a trainable adaptor module using Linear-ReLU-Linear transformation mechanisms to effectively align speech and text representations; and 3) a frozen Qwen2.5-7B-Instruct large language model (LLM) integrated with trainable LoRA for optimized contextual linguistic decoding. By systematically combining pretrained models with task specific fine-tuning, the system achieved a word/character error rate (WER/CER) of 9.83% across 11 languages in the evaluation set and ranked third place among global participants.

View on arXiv PDF

Similar