CVMay 11

AdaptSplat: Adapting Vision Foundation Models for Feed-Forward 3D Gaussian Splatting

arXiv:2605.1023985.3Has Code
AI Analysis

This work addresses the cross-domain generalization and high-frequency geometric fidelity bottlenecks in feed-forward 3D Gaussian Splatting for the computer vision community.

AdaptSplat introduces a lightweight Frequency-Preserving Adapter (1.5M parameters) for feed-forward 3D Gaussian Splatting that extracts high-frequency structural priors from vision foundation models, achieving state-of-the-art reconstruction performance on multiple benchmarks with improved cross-domain generalization and geometric fidelity.

This work explores a simple yet powerful lightweight adapter design for feed-forward 3D Gaussian Splatting (3DGS). Existing methods typically apply complex, architecture-specific designs on top of the generic pipeline of image feature extraction $\rightarrow$ multi-view interaction $\rightarrow$ feature decoding. However, constrained by the scale bottleneck of 3D training data and the low-pass filtering effect of deep networks, these methods still fall short in cross-domain generalization and high-frequency geometric fidelity. To address these problems, we propose AdaptSplat, which demonstrates that without complex component engineering, introducing a single adapter of only 1.5M parameters into the generic architecture is sufficient to achieve superior performance. Specifically, we design a lightweight Frequency-Preserving Adapter (FPA) that extracts direction-aware high-frequency structural priors from the shallow features of a powerful vision foundation model backbone, and seamlessly integrates them into the generic pipeline via high-frequency positional encodings and adaptive residual modulation. This effectively compensates for the high-frequency attenuation caused by over-smoothing in deep features, improving the fitting accuracy of Gaussian primitives on complex surfaces and sharp boundaries. Extensive experiments demonstrate that AdaptSplat achieves state-of-the-art feed-forward reconstruction performance on multiple standard benchmarks, with stable generalization across domains. Code available at: https://github.com/xmw666/AdaptSplat.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes