SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation
This addresses the problem of global localization for robotics or AR/VR applications, though it appears incremental as it builds on existing 3DGS representations.
The paper tackles camera pose estimation by proposing SGLoc, a system that regresses 6DoF poses from 3D Gaussian Splatting using semantic information, achieving superior performance on 12scenes and 7scenes datasets without requiring initial pose priors.
We propose SGLoc, a novel localization system that directly regresses camera poses from 3D Gaussian Splatting (3DGS) representation by leveraging semantic information. Our method utilizes the semantic relationship between 2D image and 3D scene representation to estimate the 6DoF pose without prior pose information. In this system, we introduce a multi-level pose regression strategy that progressively estimates and refines the pose of query image from the global 3DGS map, without requiring initial pose priors. Moreover, we introduce a semantic-based global retrieval algorithm that establishes correspondences between 2D (image) and 3D (3DGS map). By matching the extracted scene semantic descriptors of 2D query image and 3DGS semantic representation, we align the image with the local region of the global 3DGS map, thereby obtaining a coarse pose estimation. Subsequently, we refine the coarse pose by iteratively optimizing the difference between the query image and the rendered image from 3DGS. Our SGLoc demonstrates superior performance over baselines on 12scenes and 7scenes datasets, showing excellent capabilities in global localization without initial pose prior. Code will be available at https://github.com/IRMVLab/SGLoc.