CVLGJan 28, 2025

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

arXiv:2501.17162v131 citationsh-index: 25ICLR
Originality Incremental advance
AI Analysis

This provides a novel solution for panorama generation, enabling applications in virtual reality and immersive media, though it builds incrementally on existing diffusion models.

The paper tackles the problem of generating 360° panoramas from text or images by repurposing multi-view diffusion models to synthesize cubemap faces, achieving state-of-the-art results with high-quality outputs and fine-grained text control.

We introduce a novel method for generating 360° panoramas from text prompts or images. Our approach leverages recent advances in 3D generation by employing multi-view diffusion models to jointly synthesize the six faces of a cubemap. Unlike previous methods that rely on processing equirectangular projections or autoregressive generation, our method treats each face as a standard perspective image, simplifying the generation process and enabling the use of existing multi-view diffusion models. We demonstrate that these models can be adapted to produce high-quality cubemaps without requiring correspondence-aware attention layers. Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set, whilst achieving state-of-the-art results, both qualitatively and quantitatively. Project page: https://cubediff.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes