CVAILGROMar 4, 2019

Selective Sensor Fusion for Neural Visual-Inertial Odometry

arXiv:1903.01534v1156 citations
Originality Incremental advance
AI Analysis

This work addresses robustness issues in VIO for applications like autonomous driving and drones, but it is incremental as it builds on existing deep learning approaches with novel fusion strategies.

The paper tackles the problem of robust trajectory estimation in Visual-Inertial Odometry (VIO) by proposing a selective sensor fusion framework that improves performance in the presence of imperfect sensory data, such as missing or corrupted inputs, achieving better results compared to direct fusion baselines on three public datasets.

Deep learning approaches for Visual-Inertial Odometry (VIO) have proven successful, but they rarely focus on incorporating robust fusion strategies for dealing with imperfect input sensory data. We propose a novel end-to-end selective sensor fusion framework for monocular VIO, which fuses monocular images and inertial measurements in order to estimate the trajectory whilst improving robustness to real-life issues, such as missing and corrupted data or bad sensor synchronization. In particular, we propose two fusion modalities based on different masking strategies: deterministic soft fusion and stochastic hard fusion, and we compare with previously proposed direct fusion baselines. During testing, the network is able to selectively process the features of the available sensor modalities and produce a trajectory at scale. We present a thorough investigation on the performances on three public autonomous driving, Micro Aerial Vehicle (MAV) and hand-held VIO datasets. The results demonstrate the effectiveness of the fusion strategies, which offer better performances compared to direct fusion, particularly in presence of corrupted data. In addition, we study the interpretability of the fusion networks by visualising the masking layers in different scenarios and with varying data corruption, revealing interesting correlations between the fusion networks and imperfect sensory input data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes