CVMay 4, 2023

Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty

Yuan Zhang, Weihua Chen, Yichen Lu, Tao Huang, Xiuyu Sun, Jian Cao

arXiv:2305.02722v47.612 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the computational cost problem in knowledge distillation for researchers and practitioners, offering an incremental improvement over existing methods.

The paper tackles the inefficiency of training multiple teacher models for knowledge distillation by introducing Avatars, which are inference ensemble models derived from a single teacher, and proposes an uncertainty-aware factor to adaptively adjust their contributions. The method achieves up to 0.7 AP gains on COCO 2017 for object detection and 1.83 mIoU gains on Cityscapes for semantic segmentation without extra computational cost.

Knowledge distillation is an effective paradigm for boosting the performance of pocket-size model, especially when multiple teacher models are available, the student would break the upper limit again. However, it is not economical to train diverse teacher models for the disposable distillation. In this paper, we introduce a new concept dubbed Avatars for distillation, which are the inference ensemble models derived from the teacher. Concretely, (1) For each iteration of distillation training, various Avatars are generated by a perturbation transformation. We validate that Avatars own higher upper limit of working capacity and teaching ability, aiding the student model in learning diverse and receptive knowledge perspectives from the teacher model. (2) During the distillation, we propose an uncertainty-aware factor from the variance of statistical differences between the vanilla teacher and Avatars, to adjust Avatars' contribution on knowledge transfer adaptively. Avatar Knowledge Distillation AKD is fundamentally different from existing methods and refines with the innovative view of unequal training. Comprehensive experiments demonstrate the effectiveness of our Avatars mechanism, which polishes up the state-of-the-art distillation methods for dense prediction without more extra computational cost. The AKD brings at most 0.7 AP gains on COCO 2017 for Object Detection and 1.83 mIoU gains on Cityscapes for Semantic Segmentation, respectively. Code is available at https://github.com/Gumpest/AvatarKD.

View on arXiv PDF Code

Similar