GeoFill: Reference-Based Image Inpainting with Better Geometric Understanding
This improves reference-based image inpainting for scenarios with freely moving cameras, addressing a domain-specific problem with incremental advancements over prior work.
The paper tackles the challenge of precisely placing pixels from a reference image into a hole region for image inpainting by focusing on 3D geometry understanding, achieving state-of-the-art performance on RealEstate10K and MannequinChallenge datasets with large baselines and complex camera motions.
Reference-guided image inpainting restores image pixels by leveraging the content from another single reference image. The primary challenge is how to precisely place the pixels from the reference image into the hole region. Therefore, understanding the 3D geometry that relates pixels between two views is a crucial step towards building a better model. Given the complexity of handling various types of reference images, we focus on the scenario where the images are captured by freely moving the same camera around. Compared to the previous work, we propose a principled approach that does not make heuristic assumptions about the planarity of the scene. We leverage a monocular depth estimate and predict relative pose between cameras, then align the reference image to the target by a differentiable 3D reprojection and a joint optimization of relative pose and depth map scale and offset. Our approach achieves state-of-the-art performance on both RealEstate10K and MannequinChallenge dataset with large baselines, complex geometry and extreme camera motions. We experimentally verify our approach is also better at handling large holes.