Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry
This work addresses the challenge of improving visual odometry for robotics and autonomous vehicles by introducing a novel self-supervised approach, though it appears incremental as it builds on existing adversarial and sequential methods.
The paper tackles the problem of inaccurate depth and pose estimation in self-supervised visual odometry by proposing a framework that uses sequential adversarial learning, resulting in more accurate depth with preserved details and significantly outperforming state-of-the-art methods on KITTI and Cityscapes datasets.
We propose a self-supervised learning framework for visual odometry (VO) that incorporates correlation of consecutive frames and takes advantage of adversarial learning. Previous methods tackle self-supervised VO as a local structure from motion (SfM) problem that recovers depth from single image and relative poses from image pairs by minimizing photometric loss between warped and captured images. As single-view depth estimation is an ill-posed problem, and photometric loss is incapable of discriminating distortion artifacts of warped images, the estimated depth is vague and pose is inaccurate. In contrast to previous methods, our framework learns a compact representation of frame-to-frame correlation, which is updated by incorporating sequential information. The updated representation is used for depth estimation. Besides, we tackle VO as a self-supervised image generation task and take advantage of Generative Adversarial Networks (GAN). The generator learns to estimate depth and pose to generate a warped target image. The discriminator evaluates the quality of generated image with high-level structural perception that overcomes the problem of pixel-wise loss in previous methods. Experiments on KITTI and Cityscapes datasets show that our method obtains more accurate depth with details preserved and predicted pose outperforms state-of-the-art self-supervised methods significantly.