CVMay 8

Disambiguating 2D-3D Correspondences in Gaussian Splatting-based Feature Fields for Visual Localization

Miso Lee, Sangeek Hyun, Yerim Jeon, Jae-Pil Heo

arXiv:2605.0735141.6

AI Analysis

This work solves the problem of accurate and efficient visual localization for robotics and AR/VR applications by making photometric GSFFs suitable for 2D-3D matching.

SplitGS-Loc addresses the instability of PnP-based pose estimation in Gaussian Splatting-based Feature Fields by disambiguating 2D-3D correspondences via Mixture-of-Gaussians splitting and multi-view consistent Gaussian selection, achieving state-of-the-art visual localization without per-scene training or iterative refinement.

While Gaussian Splatting-based Feature Fields (GSFFs) have shown promise for visual localization, this paper highlights that photometrically optimized GSFFs are inherently ill-suited for 2D-3D matching. The volumetric extent of each Gaussian induces many-to-one pixel-to-point mappings that destabilize PnP-based pose estimation, while photometric optimization gives rise to superfluous Gaussians devoid of multi-view consistency. To address these issues, we propose SplitGS-Loc, a localization-specialized GSFFs construction framework that disambiguates 2D-3D correspondences by exploiting Gaussian attributes. Our key design, Mixture-of-Gaussians-based splitting, decomposes each Gaussian into smaller Gaussians, replacing ambiguous many-to-one with precise one-to-one correspondences. In parallel, we exploit composition weights from GS rasterization to select Gaussians that significantly and consistently contribute across multiple views and aggregate discriminative features through strong pixel-Gaussian associations, enforcing multi-view consistency. The resulting compact yet discriminative feature fields enable stable PnP convergence, achieving state-of-the-art performance on localization benchmarks. Extensive experiments validate that SplitGS-Loc extends the utility of photometric GSFFs to accurate and efficient localization by exploiting Gaussian attributes, without per-scene training or iterative pose refinement.

View on arXiv PDF

Similar