CV AI LGNov 20, 2022

A Lightweight Domain Adaptive Absolute Pose Regressor Using Barlow Twins Objective

Praveen Kumar Rajendran, Quoc-Vinh Lai-Dang, Luiz Felipe Vecchietti, Dongsoo Har

arXiv:2211.10963v12.61 citationsh-index: 18

Originality Incremental advance

AI Analysis

This work addresses domain shift issues in camera pose estimation for applications like robotics and AR/VR, offering an incremental improvement over existing methods.

The paper tackles the problem of domain generalization in absolute camera pose estimation by introducing a domain adaptive training framework that uses Barlow Twins objective and lightweight CNN architecture, achieving performance comparable to transformer-based methods with significantly fewer FLOPs, activations, and parameters, and ranking 2nd on Cambridge Landmarks and 4th on 7Scenes datasets.

Identifying the camera pose for a given image is a challenging problem with applications in robotics, autonomous vehicles, and augmented/virtual reality. Lately, learning-based methods have shown to be effective for absolute camera pose estimation. However, these methods are not accurate when generalizing to different domains. In this paper, a domain adaptive training framework for absolute pose regression is introduced. In the proposed framework, the scene image is augmented for different domains by using generative methods to train parallel branches using Barlow Twins objective. The parallel branches leverage a lightweight CNN-based absolute pose regressor architecture. Further, the efficacy of incorporating spatial and channel-wise attention in the regression head for rotation prediction is investigated. Our method is evaluated with two datasets, Cambridge landmarks and 7Scenes. The results demonstrate that, even with using roughly 24 times fewer FLOPs, 12 times fewer activations, and 5 times fewer parameters than MS-Transformer, our approach outperforms all the CNN-based architectures and achieves performance comparable to transformer-based architectures. Our method ranks 2nd and 4th with the Cambridge Landmarks and 7Scenes datasets, respectively. In addition, for augmented domains not encountered during training, our approach significantly outperforms the MS-transformer. Furthermore, it is shown that our domain adaptive framework achieves better performance than the single branch model trained with the identical CNN backbone with all instances of the unseen distribution.

View on arXiv PDF

Similar