ROAILGOct 31, 2024

3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing

arXiv:2410.24091v2109 citationsh-index: 7CoRL
Originality Incremental advance
AI Analysis

This addresses the challenge of dexterous bimanual manipulation for robots, particularly in safe interactions with fragile objects, though it appears incremental as it builds on existing multi-modal sensing and imitation learning methods.

The paper tackles the problem of enabling robots to perform fine-grained manipulation by introducing 3D-ViTac, a system that integrates low-cost tactile sensors with visual data into a unified 3D representation, and demonstrates that it significantly outperforms vision-only policies in tasks like handling fragile items and long-horizon in-hand manipulation.

Tactile and visual perception are both crucial for humans to perform fine-grained interactions with their environment. Developing similar multi-modal sensing capabilities for robots can significantly enhance and expand their manipulation skills. This paper introduces \textbf{3D-ViTac}, a multi-modal sensing and learning system designed for dexterous bimanual manipulation. Our system features tactile sensors equipped with dense sensing units, each covering an area of 3$mm^2$. These sensors are low-cost and flexible, providing detailed and extensive coverage of physical contacts, effectively complementing visual information. To integrate tactile and visual data, we fuse them into a unified 3D representation space that preserves their 3D structures and spatial relationships. The multi-modal representation can then be coupled with diffusion policies for imitation learning. Through concrete hardware experiments, we demonstrate that even low-cost robots can perform precise manipulations and significantly outperform vision-only policies, particularly in safe interactions with fragile items and executing long-horizon tasks involving in-hand manipulation. Our project page is available at \url{https://binghao-huang.github.io/3D-ViTac/}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes