RoTri-Diff: A Spatial Robot-Object Triadic Interaction-Guided Diffusion Model for Bimanual Manipulation
This work improves bimanual manipulation for robotic systems by reducing inter-arm collisions and unstable grasps, which is an incremental improvement for robotic control.
This paper addresses the challenge of bimanual manipulation by explicitly modeling the dynamic geometric relationship among two robot arms and a manipulated object, called Robot-Object Triadic Interaction (RoTri). Their RoTri-Diff framework, a diffusion-based imitation learning approach, generates stable and coordinated trajectories, outperforming state-of-the-art baselines by 10.2% on 11 RLBench2 tasks and demonstrating stable performance on 4 real-world bimanual tasks.
Bimanual manipulation is a fundamental robotic skill that requires continuous and precise coordination between two arms. While imitation learning (IL) is the dominant paradigm for acquiring this capability, existing approaches, whether robot-centric or object-centric, often overlook the dynamic geometric relationship among the two arms and the manipulated object. This limitation frequently leads to inter-arm collisions, unstable grasps, and degraded performance in complex tasks. To address this, in this paper we explicitly models the Robot-Object Triadic Interaction (RoTri) representation in bimanual systems, by encoding the relative 6D poses between the two arms and the object to capture their spatial triadic relationship and establish continuous triangular geometric constraints. Building on this, we further introduce RoTri-Diff, a diffusion-based imitation learning framework that combines RoTri constraints with robot keyposes and object motion in a hierarchical diffusion process. This enables the generation of stable, coordinated trajectories and robust execution across different modes of bimanual manipulation. Extensive experiments show that our approach outperforms state-of-the-art baselines by 10.2% on 11 representative RLBench2 tasks and achieves stable performance on 4 challenging real-world bimanual tasks. Project website: https://rotri-diff.github.io/.