CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model
This work addresses the challenge of panoramic image generation for applications like virtual reality or photography, offering a more flexible approach compared to existing methods, though it is incremental in improving multi-view diffusion frameworks.
The paper tackles the problem of generating 360-degree panoramic images from a single camera-free image and text description by eliminating the need for predefined camera poses, achieving strong robustness and generalization in outpainting tasks.
This paper introduces Camera-free Diffusion (CamFreeDiff) model for 360-degree image outpainting from a single camera-free image and text description. This method distinguishes itself from existing strategies, such as MVDiffusion, by eliminating the requirement for predefined camera poses. Instead, our model incorporates a mechanism for predicting homography directly within the multi-view diffusion framework. The core of our approach is to formulate camera estimation by predicting the homography transformation from the input view to a predefined canonical view. The homography provides point-level correspondences between the input image and targeting panoramic images, allowing connections enforced by correspondence-aware attention in a fully differentiable manner. Qualitative and quantitative experimental results demonstrate our model's strong robustness and generalization ability for 360-degree image outpainting in the challenging context of camera-free inputs.