IRCLSep 27, 2022

PROD: Progressive Distillation for Dense Retrieval

Microsoft
arXiv:2209.13335v333 citationsh-index: 66
Originality Incremental advance
AI Analysis

This work addresses a key bottleneck in improving efficiency for dense retrieval systems, though it is incremental as it builds on existing distillation techniques.

The paper tackles the problem where better teacher models in knowledge distillation for dense retrieval can lead to worse student performance due to the gap between them, and proposes PROD, a progressive distillation method that achieves state-of-the-art results on five benchmarks including MS MARCO Passage and Natural Questions.

Knowledge distillation is an effective way to transfer knowledge from a strong teacher to an efficient student model. Ideally, we expect the better the teacher is, the better the student. However, this expectation does not always come true. It is common that a better teacher model results in a bad student via distillation due to the nonnegligible gap between teacher and student. To bridge the gap, we propose PROD, a PROgressive Distillation method, for dense retrieval. PROD consists of a teacher progressive distillation and a data progressive distillation to gradually improve the student. We conduct extensive experiments on five widely-used benchmarks, MS MARCO Passage, TREC Passage 19, TREC Document 19, MS MARCO Document and Natural Questions, where PROD achieves the state-of-the-art within the distillation methods for dense retrieval. The code and models will be released.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes