ASAICLAug 15, 2025

Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge

arXiv:2508.14916v11 citationsh-index: 1Workshop on Multilingual Conversational Speech Language Model (MLC-SLM)
Originality Synthesis-oriented
AI Analysis

This work addresses multilingual ASR for participants in a specific challenge, representing an incremental improvement through hybrid methods.

The paper tackled multilingual speech recognition for the MLC-SLM 2025 Challenge by combining a frozen Whisper encoder, a trainable adaptor, and a frozen Qwen2.5 LLM with LoRA, achieving a WER/CER of 9.83% across 11 languages and ranking third.

This paper presents the architecture and performance of a novel Multilingual Automatic Speech Recognition (ASR) system developed by the Transsion Speech Team for Track 1 of the MLC-SLM 2025 Challenge. The proposed system comprises three key components: 1) a frozen Whisper-large-v3 based speech encoder, leveraging large-scale pretraining to ensure robust acoustic feature extraction; 2) a trainable adaptor module using Linear-ReLU-Linear transformation mechanisms to effectively align speech and text representations; and 3) a frozen Qwen2.5-7B-Instruct large language model (LLM) integrated with trainable LoRA for optimized contextual linguistic decoding. By systematically combining pretrained models with task specific fine-tuning, the system achieved a word/character error rate (WER/CER) of 9.83% across 11 languages in the evaluation set and ranked third place among global participants.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes