CLFeb 1

Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs

arXiv:2602.01064v11 citations
Originality Incremental advance
AI Analysis

This addresses efficiency and conflict issues in deploying lightweight LLMs, but it is incremental as it builds on existing distillation techniques.

The paper tackles the problem of knowledge conflicts and high resource demands in multi-teacher knowledge distillation for LLMs by introducing knowledge purification, which consolidates rationales from multiple teachers into a single one, and shows that proposed methods improve distilled model performance and alleviate conflicts.

Knowledge distillation has emerged as a pivotal technique for transferring knowledge from stronger large language models (LLMs) to smaller, more efficient models. However, traditional distillation approaches face challenges related to knowledge conflicts and high resource demands, particularly when leveraging multiple teacher models. In this paper, we introduce the concept of \textbf{Knowledge Purification}, which consolidates the rationales from multiple teacher LLMs into a single rationale, thereby mitigating conflicts and enhancing efficiency. To investigate the effectiveness of knowledge purification, we further propose five purification methods from various perspectives. Our experiments demonstrate that these methods not only improve the performance of the distilled model but also effectively alleviate knowledge conflicts. Moreover, router-based methods exhibit robust generalization capabilities, underscoring the potential of innovative purification techniques in optimizing multi-teacher distillation and facilitating the practical deployment of powerful yet lightweight models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes