LGCLDec 15, 2023

Student as an Inherent Denoiser of Noisy Teacher

arXiv:2312.10185v13 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the challenge of noisy teacher labels in knowledge distillation for low-data regimes, offering a method to improve efficiency in specialized model training.

The study tackles the problem of noisy pseudo labels in knowledge distillation from large language models, finding that student models inherently denoise these labels, and proposes Peer-Advised KD, which outperforms the teacher by about 5% with only 50 human-labeled examples and matches supervised fine-tuning with 750.

Knowledge distillation (KD) has been widely employed to transfer knowledge from a large language model (LLM) to a specialized model in low-data regimes through pseudo label learning. However, pseudo labels generated by teacher models are usually noisy and may influence KD performance. This study delves into KD with noisy teachers and uncovers that the student model can already generate more accurate predictions than the teacher labels used to train it during KD, indicating its inherent ability to denoise noisy teacher labels. Motivated by this finding, we propose Peer-Advised KD to improve vanilla KD from noisy teachers. Experiments show that Peer-Advised KD can outperform LLM by approximately 5% with 50 human-labeled data, and even competitive to standard supervised finetuning with 750 human-labeled data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes