LG CVJul 19, 2025

Generative Distribution Distillation

Jiequan Cui, Beier Zhu, Qingshan Xu, Xiaogang Xu, Pengguang Chen, Xiaojuan Qi, Bei Yu, Hanwang Zhang, Richang Hong

arXiv:2507.14503v17.11 citationsh-index: 12

Originality Highly original

AI Analysis

This work addresses the problem of efficient knowledge transfer in machine learning, particularly for scenarios with balanced, imbalanced, or unlabeled data, representing an incremental advancement with specific gains.

The paper tackled knowledge distillation by formulating it as a conditional generative problem, proposing the Generative Distribution Distillation (GenDD) framework, which achieved competitive unsupervised performance with a 16.29% improvement over a KL baseline on ImageNet and set a new state-of-the-art with 82.28% top-1 accuracy in supervised training.

In this paper, we formulate the knowledge distillation (KD) as a conditional generative problem and propose the \textit{Generative Distribution Distillation (GenDD)} framework. A naive \textit{GenDD} baseline encounters two major challenges: the curse of high-dimensional optimization and the lack of semantic supervision from labels. To address these issues, we introduce a \textit{Split Tokenization} strategy, achieving stable and effective unsupervised KD. Additionally, we develop the \textit{Distribution Contraction} technique to integrate label supervision into the reconstruction objective. Our theoretical proof demonstrates that \textit{GenDD} with \textit{Distribution Contraction} serves as a gradient-level surrogate for multi-task learning, realizing efficient supervised training without explicit classification loss on multi-step sampling image representations. To evaluate the effectiveness of our method, we conduct experiments on balanced, imbalanced, and unlabeled data. Experimental results show that \textit{GenDD} performs competitively in the unsupervised setting, significantly surpassing KL baseline by \textbf{16.29\%} on ImageNet validation set. With label supervision, our ResNet-50 achieves \textbf{82.28\%} top-1 accuracy on ImageNet in 600 epochs training, establishing a new state-of-the-art.

View on arXiv PDF

Similar