Projected Representation Conditioning for High-fidelity Novel View Synthesis
This addresses the problem of generating high-fidelity novel views from limited input for applications in computer vision and graphics, representing a strong incremental improvement over existing methods.
The paper tackles novel view synthesis by proposing ReNoV, a diffusion-based framework that uses external representations as conditions to improve geometric consistency in generated viewpoints. The method outperforms prior diffusion-based approaches on standard benchmarks and enables robust synthesis from sparse, unposed image collections.
We propose a novel framework for diffusion-based novel view synthesis in which we leverage external representations as conditions, harnessing their geometric and semantic correspondence properties for enhanced geometric consistency in generated novel viewpoints. First, we provide a detailed analysis exploring the correspondence capabilities emergent in the spatial attention of external visual representations. Building from these insights, we propose a representation-guided novel view synthesis through dedicated representation projection modules that inject external representations into the diffusion process, a methodology named ReNoV, short for representation-guided novel view synthesis. Our experiments show that this design yields marked improvements in both reconstruction fidelity and inpainting quality, outperforming prior diffusion-based novel-view methods on standard benchmarks and enabling robust synthesis from sparse, unposed image collections.