AirSplat: Alignment and Rating for Robust Feed-Forward 3D Gaussian Splatting
This work addresses the problem of high-fidelity, pose-free novel view synthesis for 3D vision applications, representing an incremental advancement by integrating robust geometric priors.
The paper tackles the challenge of adapting 3D Vision Foundation Models for generalizable novel view synthesis without pose information, achieving significant improvements in reconstruction quality over state-of-the-art pose-free methods.
While 3D Vision Foundation Models (3DVFMs) have demonstrated remarkable zero-shot capabilities in visual geometry estimation, their direct application to generalizable novel view synthesis (NVS) remains challenging. In this paper, we propose AirSplat, a novel training framework that effectively adapts the robust geometric priors of 3DVFMs into high-fidelity, pose-free NVS. Our approach introduces two key technical contributions: (1) Self-Consistent Pose Alignment (SCPA), a training-time feedback loop that ensures pixel-aligned supervision to resolve pose-geometry discrepancy; and (2) Rating-based Opacity Matching (ROM), which leverages the local 3D geometry consistency knowledge from a sparse-view NVS teacher model to filter out degraded primitives. Experimental results on large-scale benchmarks demonstrate that our method significantly outperforms state-of-the-art pose-free NVS approaches in reconstruction quality. Our AirSplat highlights the potential of adapting 3DVFMs to enable simultaneous visual geometry estimation and high-quality view synthesis.