CVSep 23, 2024

SOFI: Multi-Scale Deformable Transformer for Camera Calibration with Enhanced Line Queries

arXiv:2409.15553v14 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses camera calibration for tasks such as 3D rendering and object insertion in images, representing an incremental improvement over prior transformer-based methods.

The paper tackles camera calibration by estimating parameters like the zenith vanishing point and horizon line, introducing SOFI, a multi-scale deformable transformer with enhanced line queries, which outperforms existing methods on Google Street View, Horizon Line in the Wild, and Holicity datasets while maintaining competitive inference speed.

Camera calibration consists of estimating camera parameters such as the zenith vanishing point and horizon line. Estimating the camera parameters allows other tasks like 3D rendering, artificial reality effects, and object insertion in an image. Transformer-based models have provided promising results; however, they lack cross-scale interaction. In this work, we introduce \textit{multi-Scale defOrmable transFormer for camera calibratIon with enhanced line queries}, SOFI. SOFI improves the line queries used in CTRL-C and MSCC by using both line content and line geometric features. Moreover, SOFI's line queries allow transformer models to adopt the multi-scale deformable attention mechanism to promote cross-scale interaction between the feature maps produced by the backbone. SOFI outperforms existing methods on the \textit {Google Street View}, \textit {Horizon Line in the Wild}, and \textit {Holicity} datasets while keeping a competitive inference speed.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes