CL SD ASAug 15, 2025

Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation

Chenyang Le, Yinfeng Xia, Huiyan Li, Manhong Wang, Yutao Sun, Xingyang Ma, Yanmin Qian

arXiv:2508.11189v1h-index: 4INTERSPEECH

Originality Incremental advance

AI Analysis

This work addresses efficiency challenges in deploying multilingual speech translation models, particularly for local scenarios, though it appears incremental as it builds on existing methods like Whisper.

The paper tackles the problem of balancing inference efficiency and performance in multilingual speech translation models, which often have large parameter sizes, by proposing a Parasitic Dual-Scale Approach that achieves state-of-the-art performance across six languages with a 2.6x speedup over the original Whisper Medium model.

Recent advancements in speech-to-text translation have led to the development of multilingual models capable of handling multiple language pairs simultaneously. However, these unified models often suffer from large parameter sizes, making it challenging to balance inference efficiency and performance, particularly in local deployment scenarios. We propose an innovative Parasitic Dual-Scale Approach, which combines an enhanced speculative sampling method with model compression and knowledge distillation techniques. Building on the Whisper Medium model, we enhance it for multilingual speech translation into whisperM2M, and integrate our novel KVSPN module, achieving state-of-the-art (SOTA) performance across six popular languages with improved inference efficiency. KVSPN enables a 40\% speedup with no BLEU score degradation. Combined with distillation methods, it represents a 2.6$\times$ speedup over the original Whisper Medium with superior performance.

View on arXiv PDF

Similar