CVAIApr 17, 2023

Learning to Render Novel Views from Wide-Baseline Stereo Pairs

MIT
arXiv:2304.08463v1112 citationsh-index: 46
AI Analysis

This solves the problem of generating realistic novel views from sparse observations for applications in computer vision and graphics, representing a strong incremental advance.

The paper tackles novel view synthesis from a single wide-baseline stereo pair, addressing failures in existing methods due to incorrect geometry and high rendering costs, and demonstrates significant outperformance over prior work on real-world datasets.

We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair. In this challenging regime, 3D scene points are regularly observed only once, requiring prior-based reconstruction of scene geometry and appearance. We find that existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry and due to the high cost of differentiable rendering that precludes their scaling to large-scale training. We take a step towards resolving these shortcomings by formulating a multi-view transformer encoder, proposing an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray, and a lightweight cross-attention-based renderer. Our contributions enable training of our method on a large-scale real-world dataset of indoor and outdoor scenes. We demonstrate that our method learns powerful multi-view geometry priors while reducing the rendering time. We conduct extensive comparisons on held-out test scenes across two real-world datasets, significantly outperforming prior work on novel view synthesis from sparse image observations and achieving multi-view-consistent novel view synthesis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes