LGAICLAug 16, 2025

DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections

Peking U
arXiv:2508.12116v13 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the problem of efficiently managing instruction-tuning dataset mixtures for researchers and practitioners in natural language processing, representing an incremental improvement in optimization techniques.

The paper tackles the challenge of dynamically balancing and optimizing mixtures of instruction-tuning datasets by proposing DynamixSFT, a method that formulates the problem as a multi-armed bandit and uses Prior-scaled Boltzmann Exploration with a 1-Step Look-ahead Reward, achieving up to a 2.2% performance improvement across 10 benchmarks when applied to the Tulu-v2-mixture collection.

As numerous instruction-tuning datasets continue to emerge during the post-training stage, dynamically balancing and optimizing their mixtures has become a critical challenge. To address this, we propose DynamixSFT, a dynamic and automated method for instruction-tuning dataset mixture optimization. We formulate the problem as a multi-armed bandit setup and introduce a Prior-scaled Boltzmann Exploration that softly anchors the updated sampling distribution to the original dataset proportions, thereby preserving the inherent diversity and coverage of the collection. Sampling probabilities are updated using a lightweight 1-Step Look-ahead Reward, reflecting how much the dataset contributes to improving the model's performance at its current state. When applied to the Tulu-v2-mixture collection comprising 16 instruction-tuning datasets, DynamixSFT achieves up to a 2.2% performance improvement across 10 benchmarks. Furthermore, we provide a comprehensive analysis and visualizations to offer deeper insights into the adaptive dynamics of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes