Multi-modal perception for soft robotic interactions using generative models
This addresses the challenge of robust perception for soft robots in unstructured environments, though it appears incremental as it builds on existing multi-modal fusion methods.
The paper tackles the problem of enabling soft robots to interact in unstructured environments by developing a perception model that integrates touch, vision, and proprioception to create a holistic state representation, using a generative model to predict touch from other modalities.
Perception is essential for the active interaction of physical agents with the external environment. The integration of multiple sensory modalities, such as touch and vision, enhances this perceptual process, creating a more comprehensive and robust understanding of the world. Such fusion is particularly useful for highly deformable bodies such as soft robots. Developing a compact, yet comprehensive state representation from multi-sensory inputs can pave the way for the development of complex control strategies. This paper introduces a perception model that harmonizes data from diverse modalities to build a holistic state representation and assimilate essential information. The model relies on the causality between sensory input and robotic actions, employing a generative model to efficiently compress fused information and predict the next observation. We present, for the first time, a study on how touch can be predicted from vision and proprioception on soft robots, the importance of the cross-modal generation and why this is essential for soft robotic interactions in unstructured environments.