CVROSep 18, 2018

SilhoNet: An RGB Method for 6D Object Pose Estimation

arXiv:1809.06893v467 citations
Originality Incremental advance
AI Analysis

This addresses the problem of enabling autonomous robot manipulation in cost-sensitive or constrained environments where RGB-D sensors are unavailable, representing an incremental improvement over existing monocular methods.

The paper tackles 6D object pose estimation from monocular RGB images, a challenging problem due to cost or environmental constraints limiting RGB-D sensor use, and introduces SilhoNet, which predicts pose via silhouette representations and achieves better overall performance than state-of-the-art methods on the YCB-Video dataset.

Autonomous robot manipulation involves estimating the translation and orientation of the object to be manipulated as a 6-degree-of-freedom (6D) pose. Methods using RGB-D data have shown great success in solving this problem. However, there are situations where cost constraints or the working environment may limit the use of RGB-D sensors. When limited to monocular camera data only, the problem of object pose estimation is very challenging. In this work, we introduce a novel method called SilhoNet that predicts 6D object pose from monocular images. We use a Convolutional Neural Network (CNN) pipeline that takes in Region of Interest (ROI) proposals to simultaneously predict an intermediate silhouette representation for objects with an associated occlusion mask and a 3D translation vector. The 3D orientation is then regressed from the predicted silhouettes. We show that our method achieves better overall performance on the YCB-Video dataset than two state-of-the art networks for 6D pose estimation from monocular image input.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes