CVFeb 21, 2024

EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization

arXiv:2402.13537v14 citationsh-index: 2ICRA
Originality Incremental advance
AI Analysis

This work addresses camera relocalization for applications in AR, drones, robotics, and autonomous driving, presenting an incremental improvement in efficiency and accuracy.

The paper tackles the problem of 6-DOF camera relocalization from single images by proposing EffLoc, a lightweight Vision Transformer that improves memory and computational efficiency, achieving higher accuracy than prior methods like AtLoc and MapNet in large-scale outdoor scenarios.

Camera relocalization is pivotal in computer vision, with applications in AR, drones, robotics, and autonomous driving. It estimates 3D camera position and orientation (6-DoF) from images. Unlike traditional methods like SLAM, recent strides use deep learning for direct end-to-end pose estimation. We propose EffLoc, a novel efficient Vision Transformer for single-image camera relocalization. EffLoc's hierarchical layout, memory-bound self-attention, and feed-forward layers boost memory efficiency and inter-channel communication. Our introduced sequential group attention (SGA) module enhances computational efficiency by diversifying input features, reducing redundancy, and expanding model capacity. EffLoc excels in efficiency and accuracy, outperforming prior methods, such as AtLoc and MapNet. It thrives on large-scale outdoor car-driving scenario, ensuring simplicity, end-to-end trainability, and eliminating handcrafted loss functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes