CVApr 11, 2024

Taming Stable Diffusion for Text to 360° Panorama Image Generation

Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella, Xiaoshui Huang, Dinh Phung, Wanli Ouyang, Jianfei Cai

arXiv:2404.07949v122.712 citationsh-index: 24Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of text-to-panorama generation for applications like virtual reality, though it is incremental as it builds on Stable Diffusion.

The paper tackles generating 360-degree panorama images from text prompts by introducing PanFusion, a dual-branch diffusion model that leverages Stable Diffusion for prior knowledge and uses a cross-attention mechanism to reduce distortion, achieving superior performance over existing methods.

Generative models, e.g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts. Yet, the generation of 360-degree panorama images from text remains a challenge, particularly due to the dearth of paired text-panorama data and the domain gap between panorama and perspective images. In this paper, we introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt. We leverage the stable diffusion model as one branch to provide prior knowledge in natural image generation and register it to another panorama branch for holistic image generation. We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process. Our experiments validate that PanFusion surpasses existing methods and, thanks to its dual-branch structure, can integrate additional constraints like room layout for customized panorama outputs. Code is available at https://chengzhag.github.io/publication/panfusion.

View on arXiv PDF Code

Similar