ROSep 14, 2021

Simultaneous Object Reconstruction and Grasp Prediction using a Camera-centric Object Shell Representation

Nikhil Chavan-Dafle, Sergiy Popovych, Shubham Agrawal, Daniel D. Lee, Volkan Isler

arXiv:2109.06837v210.411 citations

Originality Incremental advance

AI Analysis

This work addresses the fundamental challenge of enabling robots to grasp objects efficiently in manipulation tasks, representing an incremental improvement with a novel representation.

The paper tackles the problem of robotic grasping by simultaneously reconstructing object meshes and predicting dense grasp quality maps from depth images, achieving over 90% accuracy in grasp quality estimation and more than 93% success rate in cluttered scenes.

Being able to grasp objects is a fundamental component of most robotic manipulation systems. In this paper, we present a new approach to simultaneously reconstruct a mesh and a dense grasp quality map of an object from a depth image. At the core of our approach is a novel camera-centric object representation called the "object shell" which is composed of an observed "entry image" and a predicted "exit image". We present an image-to-image residual ConvNet architecture in which the object shell and a grasp-quality map are predicted as separate output channels. The main advantage of the shell representation and the corresponding neural network architecture, ShellGrasp-Net, is that the input-output pixel correspondences in the shell representation are explicitly represented in the architecture. We show that this coupling yields superior generalization capabilities for object reconstruction and accurate grasp quality estimation implicitly considering the object geometry. Our approach yields an efficient dense grasp quality map and an object geometry estimate in a single forward pass. Both of these outputs can be used in a wide range of robotic manipulation applications. With rigorous experimental validation, both in simulation and on a real setup, we show that our shell-based method can be used to generate precise grasps and the associated grasp quality with over 90% accuracy. Diverse grasps computed on shell reconstructions allow the robot to select and execute grasps in cluttered scenes with more than 93% success rate.

View on arXiv PDF

Similar