CVApr 2

MonoSAOD: Monocular 3D Object Detection with Sparsely Annotated Label

arXiv:2604.0164619.3Has Code

AI Analysis

This addresses the high cost of 3D annotation in real-world scenarios, offering a solution for sparsely annotated data, though it appears incremental as it builds on existing monocular detection methods.

The paper tackles the problem of monocular 3D object detection with sparse annotations, where only a fraction of objects are labeled, by proposing a framework with Road-Aware Patch Augmentation and Prototype-Based Filtering, achieving robust detection performance as demonstrated in experiments.

Monocular 3D object detection has achieved impressive performance on densely annotated datasets. However, it struggles when only a fraction of objects are labeled due to the high cost of 3D annotation. This sparsely annotated setting is common in real-world scenarios where annotating every object is impractical. To address this, we propose a novel framework for sparsely annotated monocular 3D object detection with two key modules. First, we propose Road-Aware Patch Augmentation (RAPA), which leverages sparse annotations by augmenting segmented object patches onto road regions while preserving 3D geometric consistency. Second, we propose Prototype-Based Filtering (PBF), which generates high-quality pseudo-labels by filtering predictions through prototype similarity and depth uncertainty. It maintains global 2D RoI feature prototypes and selects pseudo-labels that are both feature-consistent with learned prototypes and have reliable depth estimates. Our training strategy combines geometry-preserving augmentation with prototype-guided pseudo-labeling to achieve robust detection under sparse supervision. Extensive experiments demonstrate the effectiveness of the proposed method. The source code is available at https://github.com/VisualAIKHU/MonoSAOD .

View on arXiv PDF Code

Similar