Multi-Scale Aligned Distillation for Low-Resolution Detection
This work addresses efficiency and accuracy trade-offs in instance-level detection tasks, offering a practical solution for deploying models in resource-constrained environments, though it is incremental in improving existing distillation techniques.
The paper tackles the problem of performance degradation in low-resolution object detection models by proposing a knowledge distillation method that aligns feature maps between multi-resolution teacher and low-resolution student networks, achieving a 2.1% to 3.6% improvement in mAP over baseline low-resolution models.
In instance-level detection tasks (e.g., object detection), reducing input resolution is an easy option to improve runtime efficiency. However, this option traditionally hurts the detection performance much. This paper focuses on boosting the performance of low-resolution models by distilling knowledge from a high- or multi-resolution model. We first identify the challenge of applying knowledge distillation (KD) to teacher and student networks that act on different input resolutions. To tackle it, we explore the idea of spatially aligning feature maps between models of varying input resolutions by shifting feature pyramid positions and introduce aligned multi-scale training to train a multi-scale teacher that can distill its knowledge to a low-resolution student. Further, we propose crossing feature-level fusion to dynamically fuse teacher's multi-resolution features to guide the student better. On several instance-level detection tasks and datasets, the low-resolution models trained via our approach perform competitively with high-resolution models trained via conventional multi-scale training, while outperforming the latter's low-resolution models by 2.1% to 3.6% in terms of mAP. Our code is made publicly available at https://github.com/dvlab-research/MSAD.