Real-Time Object Pose Estimation with Pose Interpreter Networks
This addresses the problem of expensive pose annotation for robotics and AR/VR applications, offering a more efficient solution.
The paper tackles 6-DoF object pose estimation by introducing pose interpreter networks that are trained on synthetic data and generalize to real data using object masks, achieving real-time performance at 20 Hz without depth or ICP refinement.
In this work, we introduce pose interpreter networks for 6-DoF object pose estimation. In contrast to other CNN-based approaches to pose estimation that require expensively annotated object pose data, our pose interpreter network is trained entirely on synthetic pose data. We use object masks as an intermediate representation to bridge real and synthetic. We show that when combined with a segmentation model trained on RGB images, our synthetically trained pose interpreter network is able to generalize to real data. Our end-to-end system for object pose estimation runs in real-time (20 Hz) on live RGB data, without using depth information or ICP refinement.