CVAug 18, 2021

SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

arXiv:2108.08367v1167 citations
AI Analysis

This work addresses the accuracy gap in end-to-end 6D pose estimation for robotics and AR/VR applications, representing an incremental improvement by integrating self-occlusion into existing methods.

The paper tackles the problem of directly regressing 6D object pose from a single RGB image, which is challenging due to inferior accuracy compared to PnP/RANSAC methods, and introduces SO-Pose, a framework that uses self-occlusion reasoning to enhance accuracy, achieving state-of-the-art or competitive results on various datasets.

Directly regressing all 6 degrees-of-freedom (6DoF) for the object pose (e.g. the 3D rotation and translation) in a cluttered environment from a single RGB image is a challenging problem. While end-to-end methods have recently demonstrated promising results at high efficiency, they are still inferior when compared with elaborate P$n$P/RANSAC-based approaches in terms of pose accuracy. In this work, we address this shortcoming by means of a novel reasoning about self-occlusion, in order to establish a two-layer representation for 3D objects which considerably enhances the accuracy of end-to-end 6D pose estimation. Our framework, named SO-Pose, takes a single RGB image as input and respectively generates 2D-3D correspondences as well as self-occlusion information harnessing a shared encoder and two separate decoders. Both outputs are then fused to directly regress the 6DoF pose parameters. Incorporating cross-layer consistencies that align correspondences, self-occlusion and 6D pose, we can further improve accuracy and robustness, surpassing or rivaling all other state-of-the-art approaches on various challenging datasets.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes