CORSD: Class-Oriented Relational Self Distillation
It addresses a specific problem in model compression for computer vision by improving relational knowledge transfer, though it appears incremental as it builds on existing distillation methods.
The paper tackled limitations in knowledge distillation by proposing CORSD, a framework that uses trainable relation networks and auxiliary classifiers to transfer relational knowledge, achieving averaged accuracy boosts of 3.8%, 1.5%, and 4.5% on CIFAR100, ImageNet, and CUB-200-2011 datasets.
Knowledge distillation conducts an effective model compression method while holding some limitations:(1) the feature based distillation methods only focus on distilling the feature map but are lack of transferring the relation of data examples; (2) the relational distillation methods are either limited to the handcrafted functions for relation extraction, such as L2 norm, or weak in inter- and intra- class relation modeling. Besides, the feature divergence of heterogeneous teacher-student architectures may lead to inaccurate relational knowledge transferring. In this work, we propose a novel training framework named Class-Oriented Relational Self Distillation (CORSD) to address the limitations. The trainable relation networks are designed to extract relation of structured data input, and they enable the whole model to better classify samples by transferring the relational knowledge from the deepest layer of the model to shallow layers. Besides, auxiliary classifiers are proposed to make relation networks capture class-oriented relation that benefits classification task. Experiments demonstrate that CORSD achieves remarkable improvements. Compared to baseline, 3.8%, 1.5% and 4.5% averaged accuracy boost can be observed on CIFAR100, ImageNet and CUB-200-2011, respectively.