Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction
This addresses the alignment tax problem for large language model developers, offering a simple yet effective method to improve performance on knowledge and reasoning tasks, though it is incremental as it builds on existing fine-tuning and model merging techniques.
The paper tackles the performance deterioration of large language models on knowledge and reasoning benchmarks during supervised fine-tuning, known as alignment tax, by introducing a disperse-then-merge framework that trains multiple sub-models on different data portions and merges them, outperforming methods like data curation and training regularization on standard benchmarks.
Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward the alignment of large language models (LLMs). However, the performance of LLMs on standard knowledge and reasoning benchmarks tends to suffer from deterioration at the latter stage of the SFT process, echoing the phenomenon of alignment tax. Through our pilot study, we put a hypothesis that the data biases are probably one cause behind the phenomenon. To address the issue, we introduce a simple disperse-then-merge framework. To be concrete, we disperse the instruction-following data into portions and train multiple sub-models using different data portions. Then we merge multiple models into a single one via model merging techniques. Despite its simplicity, our framework outperforms various sophisticated methods such as data curation and training regularization on a series of standard knowledge and reasoning benchmarks.