CVROJul 12, 2021

End-to-end Trainable Deep Neural Network for Robotic Grasp Detection and Semantic Segmentation from RGB

arXiv:2107.05287v2161 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of robotic grasping in cluttered environments for robotics applications, though it appears to be an incremental improvement combining existing tasks.

The authors developed an end-to-end trainable CNN architecture that simultaneously performs robotic grasp detection and semantic segmentation from RGB images, achieving state-of-the-art accuracy on the Cornell and Jacquard datasets. They also introduced a refinement module that improves grasp detection by leveraging segmentation results and created a dataset extension for OCID to evaluate grasp detection in complex scenes.

In this work, we introduce a novel, end-to-end trainable CNN-based architecture to deliver high quality results for grasp detection suitable for a parallel-plate gripper, and semantic segmentation. Utilizing this, we propose a novel refinement module that takes advantage of previously calculated grasp detection and semantic segmentation and further increases grasp detection accuracy. Our proposed network delivers state-of-the-art accuracy on two popular grasp dataset, namely Cornell and Jacquard. As additional contribution, we provide a novel dataset extension for the OCID dataset, making it possible to evaluate grasp detection in highly challenging scenes. Using this dataset, we show that semantic segmentation can additionally be used to assign grasp candidates to object classes, which can be used to pick specific objects in the scene.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes