Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object Detection
This work addresses the challenge of few-shot object detection for computer vision applications, offering an incremental improvement over fine-tuning-based methods.
The paper tackles the problem of few-shot object detection by explicitly modeling class-agnostic commonalities between base and novel classes, proposing a unified distillation framework that improves performance by a large margin when integrated into existing methods.
Most of existing methods for few-shot object detection follow the fine-tuning paradigm, which potentially assumes that the class-agnostic generalizable knowledge can be learned and transferred implicitly from base classes with abundant samples to novel classes with limited samples via such a two-stage training strategy. However, it is not necessarily true since the object detector can hardly distinguish between class-agnostic knowledge and class-specific knowledge automatically without explicit modeling. In this work we propose to learn three types of class-agnostic commonalities between base and novel classes explicitly: recognition-related semantic commonalities, localization-related semantic commonalities and distribution commonalities. We design a unified distillation framework based on a memory bank, which is able to perform distillation of all three types of commonalities jointly and efficiently. Extensive experiments demonstrate that our method can be readily integrated into most of existing fine-tuning based methods and consistently improve the performance by a large margin.