Category-Level 6D Object Pose Estimation via Cascaded Relation and Recurrent Reconstruction Networks
This addresses a fundamental problem in robotic manipulation and augmented reality by enabling accurate pose estimation for unseen objects, representing a strong specific gain in the domain.
The paper tackles category-level 6D object pose estimation for unseen instances by proposing cascaded relation and recurrent reconstruction networks, achieving significant performance improvements such as exceeding the state-of-the-art SPD by up to 17.7% on strict metrics.
Category-level 6D pose estimation, aiming to predict the location and orientation of unseen object instances, is fundamental to many scenarios such as robotic manipulation and augmented reality, yet still remains unsolved. Precisely recovering instance 3D model in the canonical space and accurately matching it with the observation is an essential point when estimating 6D pose for unseen objects. In this paper, we achieve accurate category-level 6D pose estimation via cascaded relation and recurrent reconstruction networks. Specifically, a novel cascaded relation network is dedicated for advanced representation learning to explore the complex and informative relations among instance RGB image, instance point cloud and category shape prior. Furthermore, we design a recurrent reconstruction network for iterative residual refinement to progressively improve the reconstruction and correspondence estimations from coarse to fine. Finally, the instance 6D pose is obtained leveraging the estimated dense correspondences between the instance point cloud and the reconstructed 3D model in the canonical space. We have conducted extensive experiments on two well-acknowledged benchmarks of category-level 6D pose estimation, with significant performance improvement over existing approaches. On the representatively strict evaluation metrics of $3D_{75}$ and $5^{\circ}2 cm$, our method exceeds the latest state-of-the-art SPD by $4.9\%$ and $17.7\%$ on the CAMERA25 dataset, and by $2.7\%$ and $8.5\%$ on the REAL275 dataset. Codes are available at https://wangjiaze.cn/projects/6DPoseEstimation.html.