CL AI SD ASJun 16, 2025

Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models

arXiv:2506.13300v34.92 citationsh-index: 2Workshop on Multilingual Conversational Speech Language Model (MLC-SLM)

Originality Incremental advance

AI Analysis

This work addresses speech processing challenges in multilingual conversational settings, representing an incremental improvement over existing baselines.

The paper tackles automatic speech recognition and speaker diarization by introducing a multi-stage training pipeline that enhances reasoning and self-correction in speech language models, achieving a WER/CER of 11.57% for Track 1 and a tcpWER/tcpCER of 17.67% for Track 2.

This paper presents Seewo's systems for both tracks of the Multilingual Conversational Speech Language Model Challenge (MLC-SLM), addressing automatic speech recognition (ASR) and speaker diarization with ASR (SD-ASR). We introduce a multi-stage training pipeline that explicitly enhances reasoning and self-correction in speech language models for ASR. Our approach combines curriculum learning for progressive capability acquisition, Chain-of-Thought data augmentation to foster intermediate reflection, and Reinforcement Learning with Verifiable Rewards (RLVR) to further refine self-correction through reward-driven optimization. This approach achieves substantial improvements over the official challenge baselines. On the evaluation set, our best system attains a WER/CER of 11.57% for Track 1 and a tcpWER/tcpCER of 17.67% for Track 2. Comprehensive ablation studies demonstrate the effectiveness of each component under challenge constraints.

View on arXiv PDF

Similar