ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation
This addresses accurate pose estimation for robotics and AR/VR applications, offering a novel method for a known bottleneck.
The paper tackles 6DoF object pose estimation by introducing a discrete descriptor for dense surface representation with hierarchical binary grouping and coarse-to-fine training, achieving major improvements on LM-O and YCB-V datasets, surpassing some RGB-D methods.
Establishing correspondences from image to 3D has been a key task of 6DoF object pose estimation for a long time. To predict pose more accurately, deeply learned dense maps replaced sparse templates. Dense methods also improved pose estimation in the presence of occlusion. More recently researchers have shown improvements by learning object fragments as segmentation. In this work, we present a discrete descriptor, which can represent the object surface densely. By incorporating a hierarchical binary grouping, we can encode the object surface very efficiently. Moreover, we propose a coarse to fine training strategy, which enables fine-grained correspondence prediction. Finally, by matching predicted codes with object surface and using a PnP solver, we estimate the 6DoF pose. Results on the public LM-O and YCB-V datasets show major improvement over the state of the art w.r.t. ADD(-S) metric, even surpassing RGB-D based methods in some cases.