CVJan 20

One-Shot Refiner: Boosting Feed-forward Novel View Synthesis via One-Step Diffusion

Yitong Dong, Qi Zhang, Minchao Jiang, Zhiqiang Wu, Qingnan Fan, Ying Feng, Huaqi Zhang, Hujun Bao, Guofeng Zhang

arXiv:2601.14161v14.03 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work improves novel view synthesis for computer vision applications, but it is incremental as it builds on existing ViT-based and diffusion methods.

The paper tackles the problem of high-fidelity novel view synthesis from sparse images by addressing limitations in feed-forward 3D Gaussian Splatting methods, such as low-resolution inputs and 3D-agnostic generative enhancements, resulting in superior generation quality across multiple datasets.

We present a novel framework for high-fidelity novel view synthesis (NVS) from sparse images, addressing key limitations in recent feed-forward 3D Gaussian Splatting (3DGS) methods built on Vision Transformer (ViT) backbones. While ViT-based pipelines offer strong geometric priors, they are often constrained by low-resolution inputs due to computational costs. Moreover, existing generative enhancement methods tend to be 3D-agnostic, resulting in inconsistent structures across views, especially in unseen regions. To overcome these challenges, we design a Dual-Domain Detail Perception Module, which enables handling high-resolution images without being limited by the ViT backbone, and endows Gaussians with additional features to store high-frequency details. We develop a feature-guided diffusion network, which can preserve high-frequency details during the restoration process. We introduce a unified training strategy that enables joint optimization of the ViT-based geometric backbone and the diffusion-based refinement module. Experiments demonstrate that our method can maintain superior generation quality across multiple datasets.

View on arXiv PDF

Similar