Re-examining Distillation For Continual Object Detection
This work addresses the problem of continual learning for object detection models, which is incremental as it builds on existing distillation methods to improve performance in realistic scenarios.
The paper tackled catastrophic forgetting in continual object detection by analyzing distillation-based approaches, finding that overly confident but incorrect teacher predictions hinder classification learning, and proposed improvements using incorrect prediction detection and an adaptive Huber loss, achieving effectiveness in both class and domain incremental settings.
Training models continually to detect and classify objects, from new classes and new domains, remains an open problem. In this work, we conduct a thorough analysis of why and how object detection models forget catastrophically. We focus on distillation-based approaches in two-stage networks; the most-common strategy employed in contemporary continual object detection work.Distillation aims to transfer the knowledge of a model trained on previous tasks -- the teacher -- to a new model -- the student -- while it learns the new task. We show that this works well for the region proposal network, but that wrong, yet overly confident teacher predictions prevent student models from effective learning of the classification head. Our analysis provides a foundation that allows us to propose improvements for existing techniques by detecting incorrect teacher predictions, based on current ground-truth labels, and by employing an adaptive Huber loss as opposed to the mean squared error for the distillation loss in the classification heads. We evidence that our strategy works not only in a class incremental setting, but also in domain incremental settings, which constitute a realistic context, likely to be the setting of representative real-world problems.