Pixel-Wise Contrastive Distillation
This addresses the challenge of pre-training small models efficiently for dense prediction tasks like object detection and segmentation, though it appears incremental as it builds on existing distillation methods.
The paper tackles the problem of self-supervised distillation for dense prediction tasks by proposing Pixel-Wise Contrastive Distillation (PCD), which achieves 37.4 AP for bounding box and 34.0 AP for mask detection on COCO using a ResNet-18-FPN backbone with Mask R-CNN.
We present a simple but effective pixel-level self-supervised distillation framework friendly to dense prediction tasks. Our method, called Pixel-Wise Contrastive Distillation (PCD), distills knowledge by attracting the corresponding pixels from student's and teacher's output feature maps. PCD includes a novel design called SpatialAdaptor which ``reshapes'' a part of the teacher network while preserving the distribution of its output features. Our ablation experiments suggest that this reshaping behavior enables more informative pixel-to-pixel distillation. Moreover, we utilize a plug-in multi-head self-attention module that explicitly relates the pixels of student's feature maps to enhance the effective receptive field, leading to a more competitive student. PCD \textbf{outperforms} previous self-supervised distillation methods on various dense prediction tasks. A backbone of \mbox{ResNet-18-FPN} distilled by PCD achieves $37.4$ AP$^\text{bbox}$ and $34.0$ AP$^\text{mask}$ on COCO dataset using the detector of \mbox{Mask R-CNN}. We hope our study will inspire future research on how to pre-train a small model friendly to dense prediction tasks in a self-supervised fashion.