RO AIApr 15

UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception

arXiv:2604.1408953.5h-index: 1Has Code

Predicted impact top 11% in RO · last 90 daysOriginality Incremental advance

AI Analysis

For embodied manipulation researchers, UMI-3D provides a robust, portable data collection system that improves policy performance and enables tasks previously infeasible with vision-only setups.

UMI-3D extends the Universal Manipulation Interface with a low-cost LiDAR sensor to overcome monocular SLAM failures due to occlusions and dynamic scenes, achieving higher success rates on standard tasks and enabling new tasks like deformable object manipulation.

We present UMI-3D, a multimodal extension of the Universal Manipulation Interface (UMI) for robust and scalable data collection in embodied manipulation. While UMI enables portable, wrist-mounted data acquisition, its reliance on monocular visual SLAM makes it vulnerable to occlusions, dynamic scenes, and tracking failures, limiting its applicability in real-world environments. UMI-3D addresses these limitations by introducing a lightweight and low-cost LiDAR sensor tightly integrated into the wrist-mounted interface, enabling LiDAR-centric SLAM with accurate metric-scale pose estimation under challenging conditions. We further develop a hardware-synchronized multimodal sensing pipeline and a unified spatiotemporal calibration framework that aligns visual observations with LiDAR point clouds, producing consistent 3D representations of demonstrations. Despite maintaining the original 2D visuomotor policy formulation, UMI-3D significantly improves the quality and reliability of collected data, which directly translates into enhanced policy performance. Extensive real-world experiments demonstrate that UMI-3D not only achieves high success rates on standard manipulation tasks, but also enables learning of tasks that are challenging or infeasible for the original vision-only UMI setup, including large deformable object manipulation and articulated object operation. The system supports an end-to-end pipeline for data acquisition, alignment, training, and deployment, while preserving the portability and accessibility of the original UMI. All hardware and software components are open-sourced to facilitate large-scale data collection and accelerate research in embodied intelligence: \href{https://umi-3d.github.io}{https://umi-3d.github.io}.

View on arXiv PDF

Similar