31.0CVMar 16
From Horizontal to Rotated: Cross-View Object Geo-Localization with Orientation AwarenessChenlin Fu, Ao Gong, Yingying Zhu
Cross-View object geo-localization (CVOGL) aims to precisely determine the geographic coordinates of a query object from a ground or drone perspective by referencing a satellite map. Segmentation-based approaches offer high precision but require prohibitively expensive pixel-level annotations, whereas more economical detection-based methods suffer from lower accuracy. This performance disparity in detection is primarily caused by two factors: the poor geometric fit of Horizontal Bounding Boxes (HBoxes) for oriented objects and the degradation in precision due to feature map scaling. Motivated by these, we propose leveraging Rotated Bounding Boxes (RBoxes) as a natural extension of the detection-based paradigm. RBoxes provide a much tighter geometric fit to oriented objects. Building on this, we introduce OSGeo, a novel geo-localization framework, meticulously designed with a multi-scale perception module and an orientation-sensitive head to accurately regress RBoxes. To support this scheme, we also construct and release CVOGL-R, the first dataset with precise RBox annotations for CVOGL. Extensive experiments demonstrate that our OSGeo achieves state-of-the-art performance, consistently matching or even surpassing the accuracy of leading segmentation-based methods but with an annotation cost that is over an order of magnitude lower.
CVSep 30, 2025
Anchor-free Cross-view Object Geo-localization with Gaussian Position Encoding and Cross-view AssociationXingtao Ling, Chenlin Fu, Yingying Zhu
Most existing cross-view object geo-localization approaches adopt anchor-based paradigm. Although effective, such methods are inherently constrained by predefined anchors. To eliminate this dependency, we first propose an anchor-free formulation for cross-view object geo-localization, termed AFGeo. AFGeo directly predicts the four directional offsets (left, right, top, bottom) to the ground-truth box for each pixel, thereby localizing the object without any predefined anchors. To obtain a more robust spatial prior, AFGeo incorporates Gaussian Position Encoding (GPE) to model the click point in the query image, mitigating the uncertainty of object position that challenges object localization in cross-view scenarios. In addition, AFGeo incorporates a Cross-view Object Association Module (CVOAM) that relates the same object and its surrounding context across viewpoints, enabling reliable localization under large cross-view appearance gaps. By adopting an anchor-free localization paradigm that integrates GPE and CVOAM with minimal parameter overhead, our model is both lightweight and computationally efficient, achieving state-of-the-art performance on benchmark datasets.