CV AISep 15, 2022

Bridging Implicit and Explicit Geometric Transformation for Single-Image View Synthesis

arXiv:2209.07105v38.810 citationsh-index: 40

Originality Incremental advance

AI Analysis

This work addresses a key challenge in computer vision for applications like virtual reality and robotics by improving view synthesis efficiency and quality, though it is incremental as it builds on existing explicit and implicit methods.

The paper tackles the 'seesaw' problem in single-image view synthesis, where preserving reprojected contents and completing realistic out-of-view regions conflict, by proposing a framework that combines explicit and implicit geometric transformations with a complementary loss function, resulting in outperforming state-of-the-art methods and generating images about 100 times faster.

Creating novel views from a single image has achieved tremendous strides with advanced autoregressive models, as unseen regions have to be inferred from the visible scene contents. Although recent methods generate high-quality novel views, synthesizing with only one explicit or implicit 3D geometry has a trade-off between two objectives that we call the "seesaw" problem: 1) preserving reprojected contents and 2) completing realistic out-of-view regions. Also, autoregressive models require a considerable computational cost. In this paper, we propose a single-image view synthesis framework for mitigating the seesaw problem while utilizing an efficient non-autoregressive model. Motivated by the characteristics that explicit methods well preserve reprojected pixels and implicit methods complete realistic out-of-view regions, we introduce a loss function to complement two renderers. Our loss function promotes that explicit features improve the reprojected area of implicit features and implicit features improve the out-of-view area of explicit features. With the proposed architecture and loss function, we can alleviate the seesaw problem, outperforming autoregressive-based state-of-the-art methods and generating an image $\approx$100 times faster. We validate the efficiency and effectiveness of our method with experiments on RealEstate10K and ACID datasets.

View on arXiv PDF

Similar