CVROJun 3

Unpaired RGB-Thermal Gaussian-Splatting Using Visual Geometric Transformers

arXiv:2606.0549129.0
AI Analysis

This work addresses the scalability limitation of requiring precisely calibrated RGB-thermal pairs for multi-modal 3D reconstruction, enabling practical deployment in unconstrained settings.

The paper introduces a framework for unpaired RGB-thermal novel view synthesis that uses VGGT to estimate camera poses independently for each modality, aligns them via Procrustes with cross-modal feature matching, and applies multi-modal 3D Gaussian Splatting. It achieves competitive thermal view synthesis while maintaining RGB fidelity, and introduces a benchmark for evaluating multi-modal coherence.

Multi-modal novel view synthesis (NVS) combining RGB and thermal imagery enables precise 3D scene reconstruction with visual and thermal information. However, existing methods typically rely on precisely calibrated RGB-thermal image pairs or stereo setups, limiting scalability and practical deployment. To address this, we introduce a framework for unpaired RGB-thermal NVS that leverages VGGT, a 3D feed-forward transformer architecture, to independently estimate camera poses for each modality. The pose sets are then aligned using the Procrustes algorithm with a cross-modal feature matcher, enabling joint registration without paired calibration. Building on this alignment, we further propose a multi-modal 3D Gaussian Splatting approach that learns directly from unpaired RGB and thermal images. Experiments on diverse scenes demonstrate that our method achieves competitive performance in thermal view synthesis while maintaining RGB fidelity. Moreover, we show that existing reconstruction approaches can produce modality-specific reconstructions that lack cross-modal consistency. We thus introduce a benchmarking framework to rigorously evaluate both per-modality image synthesis and the multi-modal coherence of reconstructed scenes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes