CLMay 16, 2022

Prompting to Distill: Boosting Data-Free Knowledge Distillation via Reinforced Prompt

arXiv:2205.07523v113 citationsh-index: 67
Originality Highly original
AI Analysis

This work addresses a key bottleneck in DFKD for accelerating pre-trained language models, offering a more effective approach for scenarios where original training data is unavailable.

The paper tackles the problem of biased synthetic data generation in data-free knowledge distillation (DFKD) by proposing PromptDFD, a method that uses a pre-trained generative model and reinforced topic prompter to produce semantically correct and thematically relevant samples, achieving results comparable to data-driven distillation in some cases.

Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data, and has recently achieved impressive results in accelerating pre-trained language models. At the heart of DFKD is to reconstruct a synthetic dataset by inverting the parameters of the uncompressed model. Prior DFKD approaches, however, have largely relied on hand-crafted priors of the target data distribution for the reconstruction, which can be inevitably biased and often incompetent to capture the intrinsic distributions. To address this problem, we propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors, which effectively harmonizes the synthetic sentences to be semantically and grammatically correct. Specifically, PromptDFD leverages a pre-trained generative model to provide language priors and introduces a reinforced topic prompter to control data synthesis, making the generated samples thematically relevant and semantically plausible, and thus friendly to downstream tasks. As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance. In some cases, PromptDFD even gives rise to results on par with those from the data-driven knowledge distillation with access to the original training data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes