MG-Grasp: Metric-Scale Geometric 6-DoF Grasping Framework with Sparse RGB Observations
This addresses the challenge of reliable robotic manipulation in environments where depth sensors are unavailable, offering a practical solution for RGB-only grasping systems.
The paper tackles the problem of 6-DoF robotic grasping from sparse RGB images without depth sensors, proposing MG-Grasp to reconstruct metric-scale point clouds and generate stable grasps, achieving state-of-the-art performance on the GraspNet-1Billion dataset and in real-world tests.
Single-view RGB-D grasp detection remains a common choice in 6-DoF robotic grasping systems, which typically requires a depth sensor. While RGB-only 6-DoF grasp methods has been studied recently, their inaccurate geometric representation is not directly suitable for physically reliable robotic manipulation, thereby hindering reliable grasp generation. To address these limitations, we propose MG-Grasp, a novel depth-free 6-DoF grasping framework that achieves high-quality object grasping. Leveraging two-view 3D foundation model with camera intrinsic/extrinsic, our method reconstructs metric-scale and multi-view consistent dense point clouds from sparse RGB images and generates stable 6-DoF grasp. Experiments on GraspNet-1Billion dataset and real world demonstrate that MG-Grasp achieves state-of-the-art (SOTA) grasp performance among RGB-based 6-DoF grasping methods.