ROCVJun 16, 2025

JENGA: Object selection and pose estimation for robotic grasping from a stack

arXiv:2506.13425v1h-index: 19IROS
Originality Incremental advance
AI Analysis

This addresses robotic grasping in structured environments like construction or warehouses, but it is incremental as it builds on existing vision-based methods for new scenarios.

The paper tackles the problem of selecting suitable objects and estimating their 6DoF pose for robotic grasping from structured stacks, proposing a camera-IMU approach and introducing a dataset and evaluation metric, with experimental results showing good performance but highlighting challenges for error-free solutions.

Vision-based robotic object grasping is typically investigated in the context of isolated objects or unstructured object sets in bin picking scenarios. However, there are several settings, such as construction or warehouse automation, where a robot needs to interact with a structured object formation such as a stack. In this context, we define the problem of selecting suitable objects for grasping along with estimating an accurate 6DoF pose of these objects. To address this problem, we propose a camera-IMU based approach that prioritizes unobstructed objects on the higher layers of stacks and introduce a dataset for benchmarking and evaluation, along with a suitable evaluation metric that combines object selection with pose accuracy. Experimental results show that although our method can perform quite well, this is a challenging problem if a completely error-free solution is needed. Finally, we show results from the deployment of our method for a brick-picking application in a construction scenario.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes