CVMay 28, 2021

TransCamP: Graph Transformer for 6-DoF Camera Pose Estimation

arXiv:2105.14065v14 citations
Originality Incremental advance
AI Analysis

This addresses camera relocalization for computer vision tasks like SLAM, but it appears incremental as it builds on prior neural network methods with a novel graph transformer integration.

The authors tackled camera pose estimation by proposing TransCamP, a neural network with a graph transformer backbone that fuses image features, camera poses, and inter-frame motions into graph attributes, achieving significantly higher computational efficiency and outperforming state-of-the-art approaches on public benchmarks.

Camera pose estimation or camera relocalization is the centerpiece in numerous computer vision tasks such as visual odometry, structure from motion (SfM) and SLAM. In this paper we propose a neural network approach with a graph transformer backbone, namely TransCamP, to address the camera relocalization problem. In contrast with prior work where the pose regression is mainly guided by photometric consistency, TransCamP effectively fuses the image features, camera pose information and inter-frame relative camera motions into encoded graph attributes and is trained towards the graph consistency and accuracy instead, yielding significantly higher computational efficiency. By leveraging graph transformer layers with edge features and enabling tensorized adjacency matrix, TransCamP dynamically captures the global attention and thus endows the pose graph with evolving structures to achieve improved robustness and accuracy. In addition, optional temporal transformer layers actively enhance the spatiotemporal inter-frame relation for sequential inputs. Evaluation of the proposed network on various public benchmarks demonstrates that TransCamP outperforms state-of-the-art approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes