CVApr 15, 2025

S$^2$Teacher: Step-by-step Teacher for Sparsely Annotated Oriented Object Detection

arXiv:2504.11111v1h-index: 12
Originality Incremental advance
AI Analysis

This addresses the labor-intensive annotation burden in remote sensing object detection, offering a practical solution for applications like aerial imagery analysis, though it is incremental as it builds on weakly/semi-supervised methods.

The paper tackles the problem of reducing annotation costs in oriented object detection for remote sensing by introducing a sparsely annotated setting (SAOOD) and proposing S$^2$Teacher, which mines pseudo-labels and reweights losses to improve performance; it achieves near-fully-supervised results on the DOTA dataset with only 10% annotations.

Although fully-supervised oriented object detection has made significant progress in multimodal remote sensing image understanding, it comes at the cost of labor-intensive annotation. Recent studies have explored weakly and semi-supervised learning to alleviate this burden. However, these methods overlook the difficulties posed by dense annotations in complex remote sensing scenes. In this paper, we introduce a novel setting called sparsely annotated oriented object detection (SAOOD), which only labels partial instances, and propose a solution to address its challenges. Specifically, we focus on two key issues in the setting: (1) sparse labeling leading to overfitting on limited foreground representations, and (2) unlabeled objects (false negatives) confusing feature learning. To this end, we propose the S$^2$Teacher, a novel method that progressively mines pseudo-labels for unlabeled objects, from easy to hard, to enhance foreground representations. Additionally, it reweights the loss of unlabeled objects to mitigate their impact during training. Extensive experiments demonstrate that S$^2$Teacher not only significantly improves detector performance across different sparse annotation levels but also achieves near-fully-supervised performance on the DOTA dataset with only 10% annotation instances, effectively balancing detection accuracy with annotation efficiency. The code will be public.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes