CL AI LGMar 10, 2025

DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs

Jongwoo Ko, Tianyi Chen, Sungnyun Kim, Tianyu Ding, Luming Liang, Ilya Zharkov, Se-Young Yun

arXiv:2503.07067v244 citationsh-index: 11ICML

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving LLM distillation for applications such as preference alignment and vision-language extensions, representing an incremental advancement by focusing on loss formulation synergy.

The paper tackles the problem of suboptimal performance in distilling large language models (LLMs) by proposing DistiLLM-2, a contrastive approach that increases the likelihood of teacher responses and decreases that of student responses, resulting in high-performing student models across tasks like instruction-following and code generation.

Despite the success of distillation in large language models (LLMs), most prior work applies identical loss functions to both teacher- and student-generated data. These strategies overlook the synergy between loss formulations and data types, leading to a suboptimal performance boost in student models. To address this, we propose DistiLLM-2, a contrastive approach that simultaneously increases the likelihood of teacher responses and decreases that of student responses by harnessing this synergy. Our extensive experiments show that DistiLLM-2 not only builds high-performing student models across a wide range of tasks, including instruction-following and code generation, but also supports diverse applications, such as preference alignment and vision-language extensions. These findings highlight the potential of a contrastive approach to enhance the efficacy of LLM distillation by effectively aligning teacher and student models across varied data types.

View on arXiv PDF

Similar