End-to-End Differentiable 6DoF Object Pose Estimation with Local and Global Constraints
This work provides a strong specific gain in 6DoF object pose estimation for computer vision systems operating in occluded environments, which is crucial for robotics and augmented reality applications.
This paper addresses the challenge of 6DoF object pose estimation from a single RGB image, particularly under heavy occlusion. The authors propose integrating local and global constraints through pairwise feature extraction and triplet regularization, respectively, combined with improved data augmentation. This approach achieves a 9% improvement over the previous state-of-the-art on the Occlusion Linemod dataset and competitive results on the Linemod dataset.
Inferring the 6DoF pose of an object from a single RGB image is an important but challenging task, especially under heavy occlusion. While recent approaches improve upon the two stage approaches by training an end-to-end pipeline, they do not leverage local and global constraints. In this paper, we propose pairwise feature extraction to integrate local constraints, and triplet regularization to integrate global constraints for improved 6DoF object pose estimation. Coupled with better augmentation, our approach achieves state of the art results on the challenging Occlusion Linemod dataset, with a 9% improvement over the previous state of the art, and achieves competitive results on the Linemod dataset.