CanKD: Cross-Attention-based Non-local operation for Feature-based Knowledge Distillation
This work addresses the challenge of improving knowledge distillation for computer vision tasks, offering a novel approach that could enhance model compression and efficiency, though it appears incremental in the context of attention-guided distillation methods.
The paper tackled the problem of feature-based knowledge distillation by proposing CanKD, a framework that uses cross-attention to enable non-local knowledge transfer between teacher and student feature maps, resulting in superior performance on object detection and image segmentation tasks compared to state-of-the-art methods.
We propose Cross-Attention-based Non-local Knowledge Distillation (CanKD), a novel feature-based knowledge distillation framework that leverages cross-attention mechanisms to enhance the knowledge transfer process. Unlike traditional self-attention-based distillation methods that align teacher and student feature maps independently, CanKD enables each pixel in the student feature map to dynamically consider all pixels in the teacher feature map. This non-local knowledge transfer more thoroughly captures pixel-wise relationships, improving feature representation learning. Our method introduces only an additional loss function to achieve superior performance compared with existing attention-guided distillation methods. Extensive experiments on object detection and image segmentation tasks demonstrate that CanKD outperforms state-of-the-art feature and hybrid distillation methods. These experimental results highlight CanKD's potential as a new paradigm for attention-guided distillation in computer vision tasks. Code is available at https://github.com/tori-hotaru/CanKD