"Touching to See" and "Seeing to Feel": Robotic Cross-modal SensoryData Generation for Visual-Tactile Perception
This work addresses the problem of limited perceivable dimensionality in robotic perception by enabling cross-modal data generation, which is incremental as it builds on existing GAN methods for a specific domain.
The paper tackles the challenge of integrating visual and tactile perception for robotics by proposing a cross-modal sensory data generation framework using conditional GANs, which generates realistic pseudo-visual images or tactile outputs from the other modality, improving classification performance on cloth textures in the ViTac dataset.
The integration of visual-tactile stimulus is common while humans performing daily tasks. In contrast, using unimodal visual or tactile perception limits the perceivable dimensionality of a subject. However, it remains a challenge to integrate the visual and tactile perception to facilitate robotic tasks. In this paper, we propose a novel framework for the cross-modal sensory data generation for visual and tactile perception. Taking texture perception as an example, we apply conditional generative adversarial networks to generate pseudo visual images or tactile outputs from data of the other modality. Extensive experiments on the ViTac dataset of cloth textures show that the proposed method can produce realistic outputs from other sensory inputs. We adopt the structural similarity index to evaluate similarity of the generated output and real data and results show that realistic data have been generated. Classification evaluation has also been performed to show that the inclusion of generated data can improve the perception performance. The proposed framework has potential to expand datasets for classification tasks, generate sensory outputs that are not easy to access, and also advance integrated visual-tactile perception.