BiGraspFormer: End-to-End Bimanual Grasp Transformer
This addresses coordination issues in bimanual robot grasping, offering a practical solution for robotics applications, though it appears incremental as it builds on transformer-based methods.
The paper tackles the problem of bimanual grasping for robots handling large objects by proposing BiGraspFormer, an end-to-end transformer framework that directly generates coordinated grasps from point clouds, achieving improved performance over existing methods with inference speeds under 0.05 seconds.
Bimanual grasping is essential for robots to handle large and complex objects. However, existing methods either focus solely on single-arm grasping or employ separate grasp generation and bimanual evaluation stages, leading to coordination problems including collision risks and unbalanced force distribution. To address these limitations, we propose BiGraspFormer, a unified end-to-end transformer framework that directly generates coordinated bimanual grasps from object point clouds. Our key idea is the Single-Guided Bimanual (SGB) strategy, which first generates diverse single grasp candidates using a transformer decoder, then leverages their learned features through specialized attention mechanisms to jointly predict bimanual poses and quality scores. This conditioning strategy reduces the complexity of the 12-DoF search space while ensuring coordinated bimanual manipulation. Comprehensive simulation experiments and real-world validation demonstrate that BiGraspFormer consistently outperforms existing methods while maintaining efficient inference speed (<0.05s), confirming the effectiveness of our framework. Code and supplementary materials are available at https://sites.google.com/view/bigraspformer