ROCVSep 15, 2021

A Framework for Multisensory Foresight for Embodied Agents

arXiv:2109.07561v14 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of multisensory foresight for robots and autonomous systems, offering an incremental advance over single-modality methods.

The paper tackles the problem of predicting future sensory states for embodied agents by proposing an unsupervised neural network framework that uses multiple sensory modalities, leading to more accurate visual frame prediction with improvements from non-visual modalities.

Predicting future sensory states is crucial for learning agents such as robots, drones, and autonomous vehicles. In this paper, we couple multiple sensory modalities with exploratory actions and propose a predictive neural network architecture to address this problem. Most existing approaches rely on large, manually annotated datasets, or only use visual data as a single modality. In contrast, the unsupervised method presented here uses multi-modal perceptions for predicting future visual frames. As a result, the proposed model is more comprehensive and can better capture the spatio-temporal dynamics of the environment, leading to more accurate visual frame prediction. The other novelty of our framework is the use of sub-networks dedicated to anticipating future haptic, audio, and tactile signals. The framework was tested and validated with a dataset containing 4 sensory modalities (vision, haptic, audio, and tactile) on a humanoid robot performing 9 behaviors multiple times on a large set of objects. While the visual information is the dominant modality, utilizing the additional non-visual modalities improves the accuracy of predictions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes