MVGrasp: Real-Time Multi-View 3D Object Grasping in Highly Cluttered Environments
This work addresses the challenge for robots to grasp objects in highly cluttered settings, which is incremental as it builds on existing multi-view and deep learning techniques for robotic manipulation.
The paper tackles the problem of robust object grasping in cluttered human-centric environments by proposing a multi-view deep learning approach that processes point clouds to generate orthographic views and pixel-wise grasp synthesis, achieving reliable closed-loop grasping of novel objects across various scenarios without fine-tuning.
Nowadays robots play an increasingly important role in our daily life. In human-centered environments, robots often encounter piles of objects, packed items, or isolated objects. Therefore, a robot must be able to grasp and manipulate different objects in various situations to help humans with daily tasks. In this paper, we propose a multi-view deep learning approach to handle robust object grasping in human-centric domains. In particular, our approach takes a point cloud of an arbitrary object as an input, and then, generates orthographic views of the given object. The obtained views are finally used to estimate pixel-wise grasp synthesis for each object. We train the model end-to-end using a small object grasp dataset and test it on both simulations and real-world data without any further fine-tuning. To evaluate the performance of the proposed approach, we performed extensive sets of experiments in three scenarios, including isolated objects, packed items, and pile of objects. Experimental results show that our approach performed very well in all simulation and real-robot scenarios, and is able to achieve reliable closed-loop grasping of novel objects across various scene configurations.