SEA: Bridging the Gap Between One- and Two-stage Detector Distillation via SEmantic-aware Alignment
This work improves object detection efficiency by enabling smaller student models to match or exceed larger teacher models, which is incremental but impactful for deployment in resource-constrained environments.
The paper tackles the problem of distilling knowledge from teacher to student detectors in both one- and two-stage frameworks by addressing pixel-level imbalance, achieving state-of-the-art results on COCO object detection with RetinaNet and FCOS models outperforming their teachers.
We revisit the one- and two-stage detector distillation tasks and present a simple and efficient semantic-aware framework to fill the gap between them. We address the pixel-level imbalance problem by designing the category anchor to produce a representative pattern for each category and regularize the topological distance between pixels and category anchors to further tighten their semantic bonds. We name our method SEA (SEmantic-aware Alignment) distillation given the nature of abstracting dense fine-grained information by semantic reliance to well facilitate distillation efficacy. SEA is well adapted to either detection pipeline and achieves new state-of-the-art results on the challenging COCO object detection task on both one- and two-stage detectors. Its superior performance on instance segmentation further manifests the generalization ability. Both 2x-distilled RetinaNet and FCOS with ResNet50-FPN outperform their corresponding 3x ResNet101-FPN teacher, arriving 40.64 and 43.06 AP, respectively. Code will be made publicly available.