CVAIRODec 18, 2025

OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction

arXiv:2512.16842v16 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited multimodal perception for researchers in egocentric vision and robotics by providing a new dataset and benchmarks, though it is incremental as it builds on existing tactile sensing efforts.

The authors tackled the lack of in-the-wild datasets aligning first-person video with full-hand touch by presenting OpenTouch, a dataset with 5.1 hours of synchronized video-touch-pose data and 2,900 curated clips, showing that tactile signals improve grasp understanding and cross-modal alignment.

The human hand is our primary interface to the physical world, yet egocentric perception rarely knows when, where, or how forcefully it makes contact. Robust wearable tactile sensors are scarce, and no existing in-the-wild datasets align first-person video with full-hand touch. To bridge the gap between visual perception and physical interaction, we present OpenTouch, the first in-the-wild egocentric full-hand tactile dataset, containing 5.1 hours of synchronized video-touch-pose data and 2,900 curated clips with detailed text annotations. Using OpenTouch, we introduce retrieval and classification benchmarks that probe how touch grounds perception and action. We show that tactile signals provide a compact yet powerful cue for grasp understanding, strengthen cross-modal alignment, and can be reliably retrieved from in-the-wild video queries. By releasing this annotated vision-touch-pose dataset and benchmark, we aim to advance multimodal egocentric perception, embodied learning, and contact-rich robotic manipulation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes