CVApr 15, 2025

S$^2$Teacher: Step-by-step Teacher for Sparsely Annotated Oriented Object Detection

Yu Lin, Jianghang Lin, Kai Ye, You Shen, Yan Zhang, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

arXiv:2504.11111v13.6h-index: 12

Originality Incremental advance

AI Analysis

This addresses the labor-intensive annotation burden in remote sensing object detection, offering a practical solution for applications like aerial imagery analysis, though it is incremental as it builds on weakly/semi-supervised methods.

The paper tackles the problem of reducing annotation costs in oriented object detection for remote sensing by introducing a sparsely annotated setting (SAOOD) and proposing S$^2$Teacher, which mines pseudo-labels and reweights losses to improve performance; it achieves near-fully-supervised results on the DOTA dataset with only 10% annotations.

Although fully-supervised oriented object detection has made significant progress in multimodal remote sensing image understanding, it comes at the cost of labor-intensive annotation. Recent studies have explored weakly and semi-supervised learning to alleviate this burden. However, these methods overlook the difficulties posed by dense annotations in complex remote sensing scenes. In this paper, we introduce a novel setting called sparsely annotated oriented object detection (SAOOD), which only labels partial instances, and propose a solution to address its challenges. Specifically, we focus on two key issues in the setting: (1) sparse labeling leading to overfitting on limited foreground representations, and (2) unlabeled objects (false negatives) confusing feature learning. To this end, we propose the S$^2$Teacher, a novel method that progressively mines pseudo-labels for unlabeled objects, from easy to hard, to enhance foreground representations. Additionally, it reweights the loss of unlabeled objects to mitigate their impact during training. Extensive experiments demonstrate that S$^2$Teacher not only significantly improves detector performance across different sparse annotation levels but also achieves near-fully-supervised performance on the DOTA dataset with only 10% annotation instances, effectively balancing detection accuracy with annotation efficiency. The code will be public.

View on arXiv PDF

Similar