CVSep 8, 2023

Robot Localization and Mapping Final Report -- Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry

Akankshya Kar, Sajal Maheshwari, Shamit Lal, Vinay Sameer Raja Kad

arXiv:2309.04147v11.51 citationsh-index: 4

Originality Incremental advance

AI Analysis

This work addresses visual odometry for robotics in challenging scenarios like low-texture images, but it is incremental as it builds on prior self-supervised approaches.

The paper tackled the problem of drift in self-supervised visual odometry by developing a method that uses optical flow, RNNs, and GANs to improve depth and pose estimation, resulting in better realism and reduced artifacts in generated images.

Visual odometry (VO) and SLAM have been using multi-view geometry via local structure from motion for decades. These methods have a slight disadvantage in challenging scenarios such as low-texture images, dynamic scenarios, etc. Meanwhile, use of deep neural networks to extract high level features is ubiquitous in computer vision. For VO, we can use these deep networks to extract depth and pose estimates using these high level features. The visual odometry task then can be modeled as an image generation task where the pose estimation is the by-product. This can also be achieved in a self-supervised manner, thereby eliminating the data (supervised) intensive nature of training deep neural networks. Although some works tried the similar approach [1], the depth and pose estimation in the previous works are vague sometimes resulting in accumulation of error (drift) along the trajectory. The goal of this work is to tackle these limitations of past approaches and to develop a method that can provide better depths and pose estimates. To address this, a couple of approaches are explored: 1) Modeling: Using optical flow and recurrent neural networks (RNN) in order to exploit spatio-temporal correlations which can provide more information to estimate depth. 2) Loss function: Generative adversarial network (GAN) [2] is deployed to improve the depth estimation (and thereby pose too), as shown in Figure 1. This additional loss term improves the realism in generated images and reduces artifacts.

View on arXiv PDF

Similar