CVMay 2, 2025

T-Graph: Enhancing Sparse-view Camera Pose Estimation by Pairwise Translation Graph

arXiv:2505.01207v1h-index: 3Isprs Journal of Photogrammetry and Remote Sensing
Originality Incremental advance
AI Analysis

This work addresses a fundamental challenge in remote sensing by enhancing pose estimation in sparse-view scenarios, though it is incremental as it builds on existing methods.

The paper tackles sparse-view camera pose estimation by introducing T-Graph, a plug-and-play module that leverages pairwise translation information, resulting in improvements of 1% to 6% in camera center accuracy across 2 to 8 viewpoints.

Sparse-view camera pose estimation, which aims to estimate the 6-Degree-of-Freedom (6-DoF) poses from a limited number of images captured from different viewpoints, is a fundamental yet challenging problem in remote sensing applications. Existing methods often overlook the translation information between each pair of viewpoints, leading to suboptimal performance in sparse-view scenarios. To address this limitation, we introduce T-Graph, a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings. T-graph takes paired image features as input and maps them through a Multilayer Perceptron (MLP). It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships. It can be seamlessly integrated into existing models as an additional branch in parallel with the original prediction, maintaining efficiency and ease of use. Furthermore, we introduce two pairwise translation representations, relative-t and pair-t, formulated under different local coordinate systems. While relative-t captures intuitive spatial relationships, pair-t offers a rotation-disentangled alternative. The two representations contribute to enhanced adaptability across diverse application scenarios, further improving our module's robustness. Extensive experiments on two state-of-the-art methods (RelPose++ and Forge) using public datasets (C03D and IMC PhotoTourism) validate both the effectiveness and generalizability of T-Graph. The results demonstrate consistent improvements across various metrics, notably camera center accuracy, which improves by 1% to 6% from 2 to 8 viewpoints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes