RO AIJul 24, 2025

Equivariant Volumetric Grasping

Pinhao Song, Yutong Hu, Pengteng Li, Renaud Detry

arXiv:2507.18847v25.72 citationsh-index: 24

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient robotic grasping, offering incremental improvements in domain-specific applications.

The paper tackles the problem of improving sample efficiency in volumetric grasp models by introducing a new equivariant model that is rotationally invariant around the vertical axis, resulting in higher performance with reduced computational and memory costs compared to non-equivariant counterparts.

We propose a new volumetric grasp model that is equivariant to rotations around the vertical axis, leading to a significant improvement in sample efficiency. Our model employs a tri-plane volumetric feature representation -- i.e., the projection of 3D features onto three canonical planes. We introduce a novel tri-plane feature design in which features on the horizontal plane are equivariant to 90° rotations, while the sum of features from the other two planes remains invariant to the same transformations. This design is enabled by a new deformable steerable convolution, which combines the adaptability of deformable convolutions with the rotational equivariance of steerable ones. This allows the receptive field to adapt to local object geometry while preserving equivariance properties. We further develop equivariant adaptations of two state-of-the-art volumetric grasp planners, GIGA and IGD. Specifically, we derive a new equivariant formulation of IGD's deformable attention mechanism and propose an equivariant generative model of grasp orientations based on flow matching. We provide a detailed analytical justification of the proposed equivariance properties and validate our approach through extensive simulated and real-world experiments. Our results demonstrate that the proposed projection-based design significantly reduces both computational and memory costs. Moreover, the equivariant grasp models built on top of our tri-plane features consistently outperform their non-equivariant counterparts, achieving higher performance with only a modest computational overhead. Video and code can be viewed in: https://mousecpn.github.io/evg-page/

View on arXiv PDF

Similar