CVNov 29, 2021

Improved Knowledge Distillation via Adversarial Collaboration

arXiv:2111.14356v14 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of knowledge loss in distillation for machine learning practitioners, offering an incremental improvement over existing proxy-based methods.

The paper tackles the capacity gap problem in knowledge distillation by proposing an adversarial collaborative method that enhances the student model's performance, achieving superior results on four classification tasks.

Knowledge distillation has become an important approach to obtain a compact yet effective model. To achieve this goal, a small student model is trained to exploit the knowledge of a large well-trained teacher model. However, due to the capacity gap between the teacher and the student, the student's performance is hard to reach the level of the teacher. Regarding this issue, existing methods propose to reduce the difficulty of the teacher's knowledge via a proxy way. We argue that these proxy-based methods overlook the knowledge loss of the teacher, which may cause the student to encounter capacity bottlenecks. In this paper, we alleviate the capacity gap problem from a new perspective with the purpose of averting knowledge loss. Instead of sacrificing part of the teacher's knowledge, we propose to build a more powerful student via adversarial collaborative learning. To this end, we further propose an Adversarial Collaborative Knowledge Distillation (ACKD) method that effectively improves the performance of knowledge distillation. Specifically, we construct the student model with multiple auxiliary learners. Meanwhile, we devise an adversarial collaborative module (ACM) that introduces attention mechanism and adversarial learning to enhance the capacity of the student. Extensive experiments on four classification tasks show the superiority of the proposed ACKD.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes