VTouch++: A Multimodal Dataset with Vision-Based Tactile Enhancement for Bimanual Manipulation
This addresses the lack of rich datasets for bimanual manipulation in robotics, though it is incremental as it builds on existing vision-based tactile sensing methods.
The paper tackles the challenge of bimanual manipulation in contact-rich tasks by introducing the VTOUCH dataset, which provides high-fidelity physical interaction signals through vision-based tactile sensing, systematic task design, and scalable automated data collection, and demonstrates its effectiveness through cross-modal retrieval experiments and real-robot evaluations.
Embodied intelligence has advanced rapidly in recent years; however, bimanual manipulation-especially in contact-rich tasks remains challenging. This is largely due to the lack of datasets with rich physical interaction signals, systematic task organization, and sufficient scale. To address these limitations, we introduce the VTOUCH dataset. It leverages vision based tactile sensing to provide high-fidelity physical interaction signals, adopts a matrix-style task design to enable systematic learning, and employs automated data collection pipelines covering real-world, demand-driven scenarios to ensure scalability. To further validate the effectiveness of the dataset, we conduct extensive quantitative experiments on cross-modal retrieval as well as real-robot evaluation. Finally, we demonstrate real-world performance through generalizable inference across multiple robots, policies, and tasks.