CVNov 16, 2023

Pseudo-keypoint RKHS Learning for Self-supervised 6DoF Pose Estimation

arXiv:2311.09500v36 citationsh-index: 3
Originality Incremental advance
AI Analysis

It addresses the problem of reducing annotation costs for 6DoF pose estimation in robotics and AR/VR by enabling training without real ground truth data, though it is incremental as it builds on existing self-supervised and keypoint methods.

The paper tackles the simulation-to-real domain gap in 6DoF pose estimation by proposing a self-supervised keypoint voting framework using a learnable kernel in RKHS, achieving state-of-the-art performance among self-supervised methods with improvements like +4.2% on LINEMOD and within -11.3% to +0.2% of top supervised results on BOP datasets.

We address the simulation-to-real domain gap in six degree-of-freedom pose estimation (6DoF PE), and propose a novel self-supervised keypoint voting-based 6DoF PE framework, effectively narrowing this gap using a learnable kernel in RKHS. We formulate this domain gap as a distance in high-dimensional feature space, distinct from previous iterative matching methods. We propose an adapter network, which is pre-trained on purely synthetic data with synthetic ground truth poses, and which evolves the network parameters from this source synthetic domain to the target real domain. Importantly, the real data training only uses pseudo-poses estimated by pseudo-keypoints, and thereby requires no real ground truth data annotations. Our proposed method is called RKHSPose, and achieves state-of-the-art performance among self-supervised methods on three commonly used 6DoF PE datasets including LINEMOD (+4.2%), Occlusion LINEMOD (+2%), and YCB-Video (+3%). It also compares favorably to fully supervised methods on all six applicable BOP core datasets, achieving within -11.3% to +0.2% of the top fully supervised results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes