CLSDASAug 15, 2025

Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation

arXiv:2508.11189v1h-index: 4INTERSPEECH
Originality Incremental advance
AI Analysis

This work addresses efficiency challenges in deploying multilingual speech translation models, particularly for local scenarios, though it appears incremental as it builds on existing methods like Whisper.

The paper tackles the problem of balancing inference efficiency and performance in multilingual speech translation models, which often have large parameter sizes, by proposing a Parasitic Dual-Scale Approach that achieves state-of-the-art performance across six languages with a 2.6x speedup over the original Whisper Medium model.

Recent advancements in speech-to-text translation have led to the development of multilingual models capable of handling multiple language pairs simultaneously. However, these unified models often suffer from large parameter sizes, making it challenging to balance inference efficiency and performance, particularly in local deployment scenarios. We propose an innovative Parasitic Dual-Scale Approach, which combines an enhanced speculative sampling method with model compression and knowledge distillation techniques. Building on the Whisper Medium model, we enhance it for multilingual speech translation into whisperM2M, and integrate our novel KVSPN module, achieving state-of-the-art (SOTA) performance across six popular languages with improved inference efficiency. KVSPN enables a 40\% speedup with no BLEU score degradation. Combined with distillation methods, it represents a 2.6$\times$ speedup over the original Whisper Medium with superior performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes