CVJan 20

One-Shot Refiner: Boosting Feed-forward Novel View Synthesis via One-Step Diffusion

arXiv:2601.14161v12 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work improves novel view synthesis for computer vision applications, but it is incremental as it builds on existing ViT-based and diffusion methods.

The paper tackles the problem of high-fidelity novel view synthesis from sparse images by addressing limitations in feed-forward 3D Gaussian Splatting methods, such as low-resolution inputs and 3D-agnostic generative enhancements, resulting in superior generation quality across multiple datasets.

We present a novel framework for high-fidelity novel view synthesis (NVS) from sparse images, addressing key limitations in recent feed-forward 3D Gaussian Splatting (3DGS) methods built on Vision Transformer (ViT) backbones. While ViT-based pipelines offer strong geometric priors, they are often constrained by low-resolution inputs due to computational costs. Moreover, existing generative enhancement methods tend to be 3D-agnostic, resulting in inconsistent structures across views, especially in unseen regions. To overcome these challenges, we design a Dual-Domain Detail Perception Module, which enables handling high-resolution images without being limited by the ViT backbone, and endows Gaussians with additional features to store high-frequency details. We develop a feature-guided diffusion network, which can preserve high-frequency details during the restoration process. We introduce a unified training strategy that enables joint optimization of the ViT-based geometric backbone and the diffusion-based refinement module. Experiments demonstrate that our method can maintain superior generation quality across multiple datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes