CV GRFeb 7, 2025

Fillerbuster: Multi-View Scene Completion for Casual Captures

Ethan Weber, Norman Müller, Yash Kant, Vasu Agrawal, Michael Zollhöfer, Angjoo Kanazawa, Christian Richardt

arXiv:2502.05175v110.26 citationsh-index: 54

Originality Highly original

AI Analysis

This work addresses the problem of scene completion for users with casual captures, providing a solution for those who need to complete areas missing from their photos.

Fillerbuster tackles the problem of completing unknown regions of a 3D scene from casual captures, achieving scene completion with hundreds of input frames and unknown camera parameters. The model demonstrates its ability to predict multiple images and poses together for scene completion.

We present Fillerbuster, a method that completes unknown regions of a 3D scene by utilizing a novel large-scale multi-view latent diffusion transformer. Casual captures are often sparse and miss surrounding content behind objects or above the scene. Existing methods are not suitable for handling this challenge as they focus on making the known pixels look good with sparse-view priors, or on creating the missing sides of objects from just one or two photos. In reality, we often have hundreds of input frames and want to complete areas that are missing and unobserved from the input frames. Additionally, the images often do not have known camera parameters. Our solution is to train a generative model that can consume a large context of input frames while generating unknown target views and recovering image poses when desired. We show results where we complete partial captures on two existing datasets. We also present an uncalibrated scene completion task where our unified model predicts both poses and creates new content. Our model is the first to predict many images and poses together for scene completion.

View on arXiv PDF

Similar