IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors
This addresses the challenge of training efficient 1-bit detectors for computer vision applications, representing an incremental improvement in knowledge distillation methods.
The paper tackles the problem of knowledge distillation being less effective for 1-bit object detectors due to information discrepancies, and presents IDa-Det, which reduces this gap and achieves a 76.9% mAP for a 1-bit Faster-RCNN with ResNet-18 backbone.
Knowledge distillation (KD) has been proven to be useful for training compact object detection models. However, we observe that KD is often effective when the teacher model and student counterpart share similar proposal information. This explains why existing KD methods are less effective for 1-bit detectors, caused by a significant information discrepancy between the real-valued teacher and the 1-bit student. This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate information discrepancies and significantly reduce the performance gap between a 1-bit detector and its real-valued counterpart. We formulate the distillation process as a bi-level optimization formulation. At the inner level, we select the representative proposals with maximum information discrepancy. We then introduce a novel entropy distillation loss to reduce the disparity based on the selected proposals. Extensive experiments demonstrate IDa-Det's superiority over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with ResNet-18 backbone. Our code is open-sourced on https://github.com/SteveTsui/IDa-Det.