CVJan 29

Rectifying Geometry-Induced Similarity Distortions for Real-World Aerial-Ground Person Re-Identification

arXiv:2601.21405v21.5h-index: 5

Originality Incremental advance

AI Analysis

This addresses a specific challenge in aerial-ground person re-identification for surveillance applications, representing an incremental improvement over existing methods.

The paper tackles the problem of aerial-ground person re-identification, where extreme viewpoint and distance discrepancies cause geometric distortions that degrade attention-based matching; the proposed Geometry-Induced Query-Key Transformation (GIQT) framework rectifies these distortions and improves robustness across four benchmarks with minimal computational overhead.

Aerial-ground person re-identification (AG-ReID) is fundamentally challenged by extreme viewpoint and distance discrepancies between aerial and ground cameras, which induce severe geometric distortions and invalidate the assumption of a shared similarity space across views. Existing methods primarily rely on geometry-aware feature learning or appearance-conditioned prompting, while implicitly assuming that the geometry-invariant dot-product similarity used in attention mechanisms remains reliable under large viewpoint and scale variations. We argue that this assumption does not hold. Extreme camera geometry systematically distorts the query-key similarity space and degrades attention-based matching, even when feature representations are partially aligned. To address this issue, we introduce Geometry-Induced Query-Key Transformation (GIQT), a lightweight low-rank module that explicitly rectifies the similarity space by conditioning query-key interactions on camera geometry. Rather than modifying feature representations or the attention formulation itself, GIQT adapts the similarity computation to compensate for dominant geometry-induced anisotropic distortions. Building on this local similarity rectification, we further incorporate a geometry-conditioned prompt generation mechanism that provides global, view-adaptive representation priors derived directly from camera geometry.Experiments on four aerial-ground person re-identification benchmarks demonstrate that the proposed framework consistently improves robustness under extreme and previously unseen geometric conditions, while introducing minimal computational overhead compared to state-of-the-art methods.

View on arXiv PDF

Similar