CLJan 13, 2024

Knowledge Distillation of Black-Box Large Language Models

Hongzhan Chen, Ruijun Chen, Yuqi Yi, Xiaojun Quan, Chenliang Li, Ming Yan, Ji Zhang

arXiv:2401.07013v24.84 citationsh-index: 22

Originality Highly original

AI Analysis

This addresses the challenge of leveraging proprietary LLMs for boosting smaller models, offering a novel solution for researchers and practitioners in AI and NLP.

The paper tackles the problem of knowledge distillation from black-box large language models (LLMs) by introducing Proxy-KD, a method that uses a proxy model to facilitate efficient knowledge transfer, resulting in enhanced performance that surpasses traditional white-box techniques.

Given the exceptional performance of proprietary large language models (LLMs) like GPT-4, recent research has increasingly focused on boosting the capabilities of smaller models through knowledge distillation (KD) from these powerful yet black-box teachers. While leveraging the high-quality outputs of these teachers is advantageous, the inaccessibility of their internal states often limits effective knowledge transfer. To overcome this limitation, we introduce Proxy-KD, a novel method that uses a proxy model to facilitate the efficient transfer of knowledge from black-box LLMs to smaller models. Our experiments show that Proxy-KD not only enhances the performance of KD from black-box teacher models but also surpasses traditional white-box KD techniques.~This approach presents a compelling new avenue for distilling knowledge from advanced LLMs.

View on arXiv PDF

Similar