CL AI LG SD ASMay 23, 2025

Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models

Chi-Yuan Hsiao, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Wei-Chih Chen, Hung-yi Lee

MIT

arXiv:2505.17496v113 citationsh-index: 11INTERSPEECH

Originality Incremental advance

AI Analysis

This addresses the problem of knowledge loss in multi-stage training for researchers developing Spoken Language Models, but it is incremental as it builds on existing continual learning techniques.

This paper tackled catastrophic forgetting in end-to-end training of Spoken Language Models by evaluating mitigation strategies like model merging, discounting LoRA scaling factor, and experience replay, finding experience replay most effective and further improved when combined with other methods.

End-to-end training of Spoken Language Models (SLMs) commonly involves adapting pre-trained text-based Large Language Models (LLMs) to the speech modality through multi-stage training on diverse tasks such as ASR, TTS and spoken question answering (SQA). Although this multi-stage continual learning equips LLMs with both speech understanding and generation capabilities, the substantial differences in task and data distributions across stages can lead to catastrophic forgetting, where previously acquired knowledge is lost. This paper investigates catastrophic forgetting and evaluates three mitigation strategies-model merging, discounting the LoRA scaling factor, and experience replay to balance knowledge retention with new learning. Results show that experience replay is the most effective, with further gains achieved by combining it with other methods. These findings provide insights for developing more robust and efficient SLM training pipelines.

View on arXiv PDF

Similar