CLASMar 5

Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR

arXiv:2603.05354v1
Originality Incremental advance
AI Analysis

This work addresses the computational challenge of adapting large speech foundation models for multi-domain ASR by providing a scalable alternative to full fine-tuning, which is beneficial for researchers and practitioners working with large ASR models.

This paper explores model merging for multi-domain Automatic Speech Recognition (ASR) by benchmarking 11 merging algorithms across 10 European Portuguese domains. They propose BoostedTSV-M, a new merging algorithm that outperforms full fine-tuning on European Portuguese while preserving out-of-distribution generalization in a single model.

Model merging is a scalable alternative to multi-task training that combines the capabilities of multiple specialised models into a single model. This is particularly attractive for large speech foundation models, which are typically adapted through domain-specific fine-tuning, resulting in multiple customised checkpoints, for which repeating full fine-tuning when new data becomes available is computationally prohibitive. In this work, we study model merging for multi-domain ASR and benchmark 11 merging algorithms for 10 European Portuguese domains, evaluating in-domain accuracy, robustness under distribution shift, as well as English and multilingual performance. We further propose BoostedTSV-M, a new merging algorithm based on TSV-M that mitigates rank collapse via singular-value boosting and improves numerical stability. Overall, our approach outperforms full fine-tuning on European Portuguese while preserving out-of-distribution generalisation in a single model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes