ROCVDec 7, 2022

See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation

Stanford
arXiv:2212.03858v2111 citationsh-index: 142
Originality Incremental advance
AI Analysis

This addresses the challenge of complex manipulation tasks for robots, offering a multisensory approach that is novel but builds on existing sensory modalities.

The paper tackled the problem of robotic manipulation by integrating visual, auditory, and tactile perception, and the result was a system that significantly outperformed prior methods on tasks like dense packing and pouring.

Humans use all of their senses to accomplish different tasks in everyday activities. In contrast, existing work on robotic manipulation mostly relies on one, or occasionally two modalities, such as vision and touch. In this work, we systematically study how visual, auditory, and tactile perception can jointly help robots to solve complex manipulation tasks. We build a robot system that can see with a camera, hear with a contact microphone, and feel with a vision-based tactile sensor, with all three sensory modalities fused with a self-attention model. Results on two challenging tasks, dense packing and pouring, demonstrate the necessity and power of multisensory perception for robotic manipulation: vision displays the global status of the robot but can often suffer from occlusion, audio provides immediate feedback of key moments that are even invisible, and touch offers precise local geometry for decision making. Leveraging all three modalities, our robotic system significantly outperforms prior methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes