6D Pose Estimation with Correlation Fusion
This work improves robotic tasks like grasping by providing more accurate pose estimation under occlusion and poor illumination, though it is incremental as it builds on prior RGB-D methods.
The paper tackles the problem of 6D object pose estimation by addressing limitations in existing RGB-D methods that fail to exploit consistent and complementary information between RGB and depth modalities, achieving state-of-the-art performance on LineMOD and YCB-Video datasets.
6D object pose estimation is widely applied in robotic tasks such as grasping and manipulation. Prior methods using RGB-only images are vulnerable to heavy occlusion and poor illumination, so it is important to complement them with depth information. However, existing methods using RGB-D data cannot adequately exploit consistent and complementary information between RGB and depth modalities. In this paper, we present a novel method to effectively consider the correlation within and across both modalities with attention mechanism to learn discriminative and compact multi-modal features. Then, effective fusion strategies for intra- and inter-correlation modules are explored to ensure efficient information flow between RGB and depth. To our best knowledge, this is the first work to explore effective intra- and inter-modality fusion in 6D pose estimation. The experimental results show that our method can achieve the state-of-the-art performance on LineMOD and YCB-Video dataset. We also demonstrate that the proposed method can benefit a real-world robot grasping task by providing accurate object pose estimation.