GS-Pose: Generalizable Segmentation-based 6D Object Pose Estimation with 3D Gaussian Splatting
It addresses object pose estimation for robotics and AR applications, but appears incremental as it builds on existing segmentation and retrieval methods with a new rendering technique.
The paper tackles the problem of estimating 6D poses for novel objects by introducing GS-Pose, a framework that uses 3D Gaussian splatting for refinement, achieving state-of-the-art results on LINEMOD and OnePose-LowTexture datasets.
This paper introduces GS-Pose, a unified framework for localizing and estimating the 6D pose of novel objects. GS-Pose begins with a set of posed RGB images of a previously unseen object and builds three distinct representations stored in a database. At inference, GS-Pose operates sequentially by locating the object in the input image, estimating its initial 6D pose using a retrieval approach, and refining the pose with a render-and-compare method. The key insight is the application of the appropriate object representation at each stage of the process. In particular, for the refinement step, we leverage 3D Gaussian splatting, a novel differentiable rendering technique that offers high rendering speed and relatively low optimization time. Off-the-shelf toolchains and commodity hardware, such as mobile phones, can be used to capture new objects to be added to the database. Extensive evaluations on the LINEMOD and OnePose-LowTexture datasets demonstrate excellent performance, establishing the new state-of-the-art. Project page: https://dingdingcai.github.io/gs-pose.