ROODI: Reconstructing Occluded Objects with Denoising Inpainters
This addresses the problem of object extraction from complex 3D scenes for computer vision and graphics applications, representing a novel method for a known bottleneck.
The paper tackles the problem of extracting specific objects from 3D Gaussian Splatting scenes, which is challenging due to occlusions and irrelevant primitives, by proposing a method that combines Wasserstein distance-based pruning and diffusion-based inpainting, outperforming state-of-the-art approaches on standard and synthetic datasets.
While the quality of novel-view images has improved dramatically with 3D Gaussian Splatting, extracting specific objects from scenes remains challenging. Isolating individual 3D Gaussian primitives for each object and handling occlusions in scenes remains far from being solved. We propose a novel object extraction method based on two key principles: (1) object-centric reconstruction through removal of irrelevant primitives; and (2) leveraging generative inpainting to compensate for missing observations caused by occlusions. For pruning, we propose to remove irrelevant Gaussians by looking into how close they are to its K-nearest neighbors and removing those that are statistical outliers. Importantly, these distances must take into account the actual spatial extent they cover -- we thus propose to use Wasserstein distances. For inpainting, we employ an off-the-shelf diffusion-based inpainter combined with occlusion reasoning, utilizing the 3D representation of the entire scene. Our findings highlight the crucial synergy between proper pruning and inpainting, both of which significantly enhance extraction performance. We evaluate our method on a standard real-world dataset and introduce a synthetic dataset for quantitative analysis. Our approach outperforms the state-of-the-art, demonstrating its effectiveness in object extraction from complex scenes.