LGAICLMay 21, 2025

Merge to Mix: Mixing Datasets via Model Merging

arXiv:2505.16066v16 citationsh-index: 11DATA
Originality Incremental advance
AI Analysis

This addresses the challenge of dataset selection for fine-tuning large models, which is critical for maximizing performance on downstream tasks, and it is incremental by building on existing model merging techniques.

The paper tackles the problem of efficiently composing dataset mixtures for fine-tuning large models by proposing Merge to Mix, a method that uses model merging as a surrogate for full fine-tuning, and it surpasses state-of-the-art methods in dataset selection.

Mixing datasets for fine-tuning large models (LMs) has become critical for maximizing performance on downstream tasks. However, composing effective dataset mixtures typically relies on heuristics and trial-and-error, often requiring multiple fine-tuning runs to achieve the desired outcome. We propose a novel method, $\textit{Merge to Mix}$, that accelerates composing dataset mixtures through model merging. Model merging is a recent technique that combines the abilities of multiple individually fine-tuned LMs into a single LM by using a few simple arithmetic operations. Our key insight is that merging models individually fine-tuned on each dataset in a mixture can effectively serve as a surrogate for a model fine-tuned on the entire mixture. Merge to Mix leverages this insight to accelerate selecting dataset mixtures without requiring full fine-tuning on each candidate mixture. Our experiments demonstrate that Merge to Mix surpasses state-of-the-art methods in dataset selection for fine-tuning LMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes