LGJan 19, 2025

Control LLM: Controlled Evolution for Intelligence Retention in LLM

Haichao Wei, Yunxiang Ren, Zhoutong Fu, Aman Lunia, Yi-Lin Chen, Alice Leung, Ya Xu

arXiv:2501.10979v29.43 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This addresses a key problem for organizations needing to update LLMs efficiently without retraining, though it is incremental as it builds on existing methods for catastrophic forgetting.

The paper tackles catastrophic forgetting in large language models during continuous training, proposing Control LLM, which uses parallel transformer blocks and interpolation to retain performance on existing tasks while integrating new knowledge, achieving improvements like +14.4% on Math-Hard and minimal degradation (<4.3% on MMLU).

Large Language Models (LLMs) demand significant computational resources, making it essential to enhance their capabilities without retraining from scratch. A key challenge in this domain is \textit{catastrophic forgetting} (CF), which hampers performance during Continuous Pre-training (CPT) and Continuous Supervised Fine-Tuning (CSFT). We propose \textbf{Control LLM}, a novel approach that leverages parallel pre-trained and expanded transformer blocks, aligning their hidden-states through interpolation strategies This method effectively preserves performance on existing tasks while seamlessly integrating new knowledge. Extensive experiments demonstrate the effectiveness of Control LLM in both CPT and CSFT. On Llama3.1-8B-Instruct, it achieves significant improvements in mathematical reasoning ($+14.4\%$ on Math-Hard) and coding performance ($+10\%$ on MBPP-PLUS). On Llama3.1-8B, it enhances multilingual capabilities ($+10.6\%$ on C-Eval, $+6.8\%$ on CMMLU, and $+30.2\%$ on CMMLU-0shot-CoT). It surpasses existing methods and achieves SOTA among open-source models tuned from the same base model, using substantially less data and compute. Crucially, these gains are realized while preserving strong original capabilities, with minimal degradation ($<4.3\% \text{on MMLU}$) compared to $>35\%$ in open-source Math and Coding models. This approach has been successfully deployed in LinkedIn's GenAI-powered job seeker and Ads unit products. To support further research, we release the training and evaluation code (https://github.com/linkedin/ControlLLM) along with models trained on public datasets (https://huggingface.co/ControlLLM) to the community.

View on arXiv PDF Code

Similar