CVMay 25, 2023

Triplet Knowledge Distillation

Xijun Wang, Dongyang Liu, Meina Kan, Chunrui Han, Zhongqin Wu, Shiguang Shan

arXiv:2305.15975v12.83 citations

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in knowledge distillation for machine learning practitioners, offering an incremental improvement over existing methods.

The paper tackles the difficulty of knowledge distillation when the teacher model is too large for the student to mimic effectively by introducing TriKD, a triplet mechanism that uses an anchor model to define an easy-to-learn subspace, resulting in improved student performance in image classification and face recognition tasks.

In Knowledge Distillation, the teacher is generally much larger than the student, making the solution of the teacher likely to be difficult for the student to learn. To ease the mimicking difficulty, we introduce a triplet knowledge distillation mechanism named TriKD. Besides teacher and student, TriKD employs a third role called anchor model. Before distillation begins, the pre-trained anchor model delimits a subspace within the full solution space of the target problem. Solutions within the subspace are expected to be easy targets that the student could mimic well. Distillation then begins in an online manner, and the teacher is only allowed to express solutions within the aforementioned subspace. Surprisingly, benefiting from accurate but easy-to-mimic hints, the student can finally perform well. After the student is well trained, it can be used as the new anchor for new students, forming a curriculum learning strategy. Our experiments on image classification and face recognition with various models clearly demonstrate the effectiveness of our method. Furthermore, the proposed TriKD is also effective in dealing with the overfitting issue. Moreover, our theoretical analysis supports the rationality of our triplet distillation.

View on arXiv PDF

Similar