CV ROJan 31, 2024

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

Fengyu Yang, Chao Feng, Ziyang Chen, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong

arXiv:2401.18084v130.4143 citationsh-index: 7CVPR

Originality Highly original

AI Analysis

This work addresses the problem of expensive and non-standardized tactile data collection for multimodal systems, offering a novel approach to integrate touch with vision, language, and sound.

The paper tackles the challenge of multimodal learning with touch by introducing UniTouch, a unified tactile model that aligns embeddings from vision-based touch sensors to pretrained image embeddings, enabling zero-shot tasks like robot grasping prediction and touch image question answering.

The ability to associate touch with other modalities has huge implications for humans and computational systems. However, multimodal learning with touch remains challenging due to the expensive data collection process and non-standardized sensor outputs. We introduce UniTouch, a unified tactile model for vision-based touch sensors connected to multiple modalities, including vision, language, and sound. We achieve this by aligning our UniTouch embeddings to pretrained image embeddings already associated with a variety of other modalities. We further propose learnable sensor-specific tokens, allowing the model to learn from a set of heterogeneous tactile sensors, all at the same time. UniTouch is capable of conducting various touch sensing tasks in the zero-shot setting, from robot grasping prediction to touch image question answering. To the best of our knowledge, UniTouch is the first to demonstrate such capabilities. Project page: https://cfeng16.github.io/UniTouch/

View on arXiv PDF

Similar