Self-improving Multiplane-to-layer Images for Novel View Synthesis
This work addresses the need for efficient and generalizable view synthesis without per-scene optimization, benefiting applications in computer vision and graphics.
The paper tackles the problem of lightweight novel-view synthesis for arbitrary forward-facing scenes by introducing a method that converts multiplane images to deformable layers and refines them with a feed-forward procedure, achieving superior performance in common metrics and human evaluation with faster inference and more compact geometry.
We present a new method for lightweight novel-view synthesis that generalizes to an arbitrary forward-facing scene. Recent approaches are computationally expensive, require per-scene optimization, or produce a memory-expensive representation. We start by representing the scene with a set of fronto-parallel semitransparent planes and afterward convert them to deformable layers in an end-to-end manner. Additionally, we employ a feed-forward refinement procedure that corrects the estimated representation by aggregating information from input views. Our method does not require fine-tuning when a new scene is processed and can handle an arbitrary number of views without restrictions. Experimental results show that our approach surpasses recent models in terms of common metrics and human evaluation, with the noticeable advantage in inference speed and compactness of the inferred layered geometry, see https://samsunglabs.github.io/MLI