CVAug 20, 2019

Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation

arXiv:1908.07433v10.00532 citations
AI Analysis55

This addresses pose estimation for robotics and AR/VR applications, offering a novel approach that is robust to occlusion and symmetries, though it builds on existing techniques like auto-encoders and GANs.

The paper tackles the challenge of 6D pose estimation from RGB images under occlusion and symmetries by proposing Pix2Pose, which predicts 3D coordinates per pixel without textured models, and it outperforms state-of-the-art methods on benchmark datasets.

Estimating the 6D pose of objects using only RGB images remains challenging because of problems such as occlusion and symmetries. It is also difficult to construct 3D models with precise texture without expert knowledge or specialized scanning devices. To address these problems, we propose a novel pose estimation method, Pix2Pose, that predicts the 3D coordinates of each object pixel without textured models. An auto-encoder architecture is designed to estimate the 3D coordinates and expected errors per pixel. These pixel-wise predictions are then used in multiple stages to form 2D-3D correspondences to directly compute poses with the PnP algorithm with RANSAC iterations. Our method is robust to occlusion by leveraging recent achievements in generative adversarial training to precisely recover occluded parts. Furthermore, a novel loss function, the transformer loss, is proposed to handle symmetric objects by guiding predictions to the closest symmetric pose. Evaluations on three different benchmark datasets containing symmetric and occluded objects show our method outperforms the state of the art using only RGB images.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes