CVAIFeb 6, 2025

Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances

arXiv:2502.04268v217 citationsh-index: 15Has CodeCVPR
Originality Incremental advance
AI Analysis

This work addresses the challenge of reducing annotation costs for oriented object detection in densely packed scenes, representing an incremental improvement over prior point-supervised methods.

The paper tackles the problem of oriented object detection using only point annotations by introducing Point2RBox-v2, which leverages spatial layout among instances through Gaussian overlap, Voronoi watershed, and consistency losses, achieving 62.61%, 86.15%, and 34.71% on DOTA, HRSC, and FAIR1M datasets.

With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning OOD from point annotations has gained great attention. In this paper, we rethink this challenging task setting with the layout among instances and present Point2RBox-v2. At the core are three principles: 1) Gaussian overlap loss. It learns an upper bound for each instance by treating objects as 2D Gaussian distributions and minimizing their overlap. 2) Voronoi watershed loss. It learns a lower bound for each instance through watershed on Voronoi tessellation. 3) Consistency loss. It learns the size/rotation variation between two output sets with respect to an input image and its augmented view. Supplemented by a few devised techniques, e.g. edge loss and copy-paste, the detector is further enhanced. To our best knowledge, Point2RBox-v2 is the first approach to explore the spatial layout among instances for learning point-supervised OOD. Our solution is elegant and lightweight, yet it is expected to give a competitive performance especially in densely packed scenes: 62.61%/86.15%/34.71% on DOTA/HRSC/FAIR1M. Code is available at https://github.com/VisionXLab/point2rbox-v2.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes