CVDec 8, 2020

Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry

arXiv:2101.02143v13 citations
Originality Highly original
AI Analysis

This work addresses the problem of inaccurate, time-consuming, or error-accumulative unsupervised visual odometry for autonomous systems, offering a more robust solution.

This paper proposes a new unsupervised visual odometry method that combines two pose estimators: one for pairwise images (F2FPE) and another for short image sequences using a Transformer-like structure (TAPE). The method achieves state-of-the-art performance among unsupervised learning-based methods and is comparable to supervised and traditional methods on the KITTI and Malaga datasets.

Existing unsupervised visual odometry (VO) methods either match pairwise images or integrate the temporal information using recurrent neural networks over a long sequence of images. They are either not accurate, time-consuming in training or error accumulative. In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images respectively. For image sequences, a Transformer-like structure is adopted to build a geometry model over a local temporal window, referred to as Transformer-based Auxiliary Pose Estimator (TAPE). Meanwhile, a Flow-to-Flow Pose Estimator (F2FPE) is proposed to exploit the relationship between pairwise images. The two estimators are constrained through a simple yet effective consistency loss in training. Empirical evaluation has shown that the proposed method outperforms the state-of-the-art unsupervised learning-based methods by a large margin and performs comparably to supervised and traditional ones on the KITTI and Malaga dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes