CVROMar 7, 2025

SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting

arXiv:2503.05174v16 citationsh-index: 2IROS
Originality Incremental advance
AI Analysis

This addresses a fundamental computer vision task for applications like augmented reality and robotics, offering a cost-effective alternative to depth or multi-view setups, though it appears incremental as it builds on existing 3D Gaussian Splatting techniques.

The paper tackles the problem of 6-DoF pose estimation from single RGB images, which often suffers from inaccuracies due to rotational ambiguity and reliance on initial estimates, by introducing SplatPose, a framework that combines 3D Gaussian Splatting with a dual-branch neural architecture to achieve state-of-the-art accuracy, rivaling methods using depth or multi-view images.

6-DoF pose estimation is a fundamental task in computer vision with wide-ranging applications in augmented reality and robotics. Existing single RGB-based methods often compromise accuracy due to their reliance on initial pose estimates and susceptibility to rotational ambiguity, while approaches requiring depth sensors or multi-view setups incur significant deployment costs. To address these limitations, we introduce SplatPose, a novel framework that synergizes 3D Gaussian Splatting (3DGS) with a dual-branch neural architecture to achieve high-precision pose estimation using only a single RGB image. Central to our approach is the Dual-Attention Ray Scoring Network (DARS-Net), which innovatively decouples positional and angular alignment through geometry-domain attention mechanisms, explicitly modeling directional dependencies to mitigate rotational ambiguity. Additionally, a coarse-to-fine optimization pipeline progressively refines pose estimates by aligning dense 2D features between query images and 3DGS-synthesized views, effectively correcting feature misalignment and depth errors from sparse ray sampling. Experiments on three benchmark datasets demonstrate that SplatPose achieves state-of-the-art 6-DoF pose estimation accuracy in single RGB settings, rivaling approaches that depend on depth or multi-view images.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes