Targeted Error Correction in Knowledge Distillation: Small Language Models Surpass GPT
This work addresses cost efficiency and data privacy issues for users of open-source LLMs in downstream applications, though it appears incremental as it builds on existing knowledge distillation techniques.
The paper tackled the problem of improving smaller language models in customer service summarization by introducing an Analyze-Revise-Finetune (ARF) pipeline, which enabled a Llama 3.1 8B model to surpass GPT-3.5 in performance.
We introduce an Analyze-Revise-Finetune (ARF) pipeline that enables smaller open-source language models (LLMs) to surpass substantially larger proprietary models in customer service summarization tasks. The pipeline first analyzes and categorizes common errors in summaries produced by a teacher model (GPT-3.5), then performs a targeted revision using a compact editor model (Llama 3.1 70B) to generate high-quality, refined training data. Fine-tuning a smaller student model (Llama 3.1 8B) on this refined data resulted in superior summarization performance compared to GPT-3.5. The ARF pipeline improves cost efficiency and data privacy while maintaining competitive accuracy, illustrating a generalizable framework for enhancing open-source LLMs across diverse downstream applications.