Semi-supervised Object Detection via Virtual Category Learning
This work addresses a key challenge in semi-supervised object detection for applications with costly labeled data, offering a novel approach to improve model generalization without exacerbating confirmation bias.
The paper tackles the problem of handling confusing samples in semi-supervised object detection by assigning virtual categories to these samples, allowing them to contribute to model optimization without label correction, and it demonstrates significant improvements over state-of-the-art methods, particularly with limited labeled data.
Due to the costliness of labelled data in real-world applications, semi-supervised object detectors, underpinned by pseudo labelling, are appealing. However, handling confusing samples is nontrivial: discarding valuable confusing samples would compromise the model generalisation while using them for training would exacerbate the confirmation bias issue caused by inevitable mislabelling. To solve this problem, this paper proposes to use confusing samples proactively without label correction. Specifically, a virtual category (VC) is assigned to each confusing sample such that they can safely contribute to the model optimisation even without a concrete label. It is attributed to specifying the embedding distance between the training sample and the virtual category as the lower bound of the inter-class distance. Moreover, we also modify the localisation loss to allow high-quality boundaries for location regression. Extensive experiments demonstrate that the proposed VC learning significantly surpasses the state-of-the-art, especially with small amounts of available labels.