ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
This addresses the challenge of realistic object editing in images for applications like photo manipulation and visual effects, though it is incremental as it builds on existing diffusion models with a novel dataset approach.
The paper tackles the problem of diffusion models violating physical laws like occlusions, shadows, and reflections during image editing by proposing a method based on a counterfactual dataset for object removal and insertion, achieving significant outperformance over prior methods in photorealistic results.
Diffusion models have revolutionized image editing but often generate images that violate physical laws, particularly the effects of objects on the scene, e.g., occlusions, shadows, and reflections. By analyzing the limitations of self-supervised approaches, we propose a practical solution centered on a \q{counterfactual} dataset. Our method involves capturing a scene before and after removing a single object, while minimizing other changes. By fine-tuning a diffusion model on this dataset, we are able to not only remove objects but also their effects on the scene. However, we find that applying this approach for photorealistic object insertion requires an impractically large dataset. To tackle this challenge, we propose bootstrap supervision; leveraging our object removal model trained on a small counterfactual dataset, we synthetically expand this dataset considerably. Our approach significantly outperforms prior methods in photorealistic object removal and insertion, particularly at modeling the effects of objects on the scene.