LGJun 3

TANDEM: Bi-Level Data Mixture Optimization with Twin Networks

arXiv:2606.0440198.0
AI Analysis

For LLM practitioners, TANDEM provides a principled method to optimize training data composition, outperforming prior approaches with theoretical guarantees.

TANDEM introduces a bi-level optimization approach for domain mixture ratios in LLM training, solved via twin networks that measure data efficacy. It achieves significant performance improvements in data-restricted and supervised fine-tuning scenarios.

The capabilities of large language models (LLMs) significantly depend on training data drawn from various domains. Optimizing domain-specific mixture ratios can be modeled as a bi-level optimization problem, which we simplify into a single-level penalized form and solve with twin networks: a proxy model trained on primary data and a dynamically updated reference model trained with additional data. Our proposed method, Twin Networks for bi-level DatA mixturE optiMization (TANDEM), measures the data efficacy through the difference between the twin models and up-weights domains that benefit more from the additional data. TANDEM provides theoretical guarantees and wider applicability, compared to prior approaches. Furthermore, our bi-level perspective suggests new settings to study domain reweighting such as data-restricted scenarios and supervised fine-tuning, where optimized mixture ratios significantly improve the performance. Extensive experiments validate TANDEM's effectiveness in all scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes