ViewSplat: View-Adaptive Dynamic Gaussian Splatting for Feed-Forward Synthesis
This addresses the problem of low-fidelity 3D scene reconstruction from unposed images for applications like VR/AR, representing a novel paradigm shift rather than an incremental improvement.
The paper tackles the fidelity gap in feed-forward 3D Gaussian splatting for novel view synthesis by introducing view-adaptive dynamic splatting, which allows primitives to adjust based on viewpoints, achieving state-of-the-art fidelity with fast inference at 17 FPS and real-time rendering at 154 FPS.
We present ViewSplat, a view-adaptive 3D Gaussian splatting network for novel view synthesis from unposed images. While recent feed-forward 3D Gaussian splatting has significantly accelerated 3D scene reconstruction by bypassing per-scene optimization, a fundamental fidelity gap remains. We attribute this bottleneck to the limited capacity of single-step feed-forward networks to regress static Gaussian primitives that satisfy all viewpoints. To address this limitation, we shift the paradigm from static primitive regression to view-adaptive dynamic splatting. Instead of a rigid Gaussian representation, our pipeline learns a view-adaptable latent representation. Specifically, ViewSplat initially predicts base Gaussian primitives alongside the weights of dynamic MLPs. During rendering, these MLPs take target view coordinates as input and predict view-dependent residual updates for each Gaussian attribute (i.e., 3D position, scale, rotation, opacity, and color). This mechanism, which we term view-adaptive dynamic splatting, allows each primitive to rectify initial estimation errors, effectively capturing high-fidelity appearances. Extensive experiments demonstrate that ViewSplat achieves state-of-the-art fidelity while maintaining fast inference (17 FPS) and real-time rendering (154 FPS).