CVMay 22, 2025

Conditional Panoramic Image Generation via Masked Autoregressive Modeling

arXiv:2505.16862v24 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses the problem of generating panoramic images for applications like virtual reality and 360-degree media, offering a more efficient and integrated approach, though it is incremental in improving upon existing generative models.

The authors tackled the limitations of existing panoramic image generation methods, which rely on diffusion models ill-suited for equirectangular projections and treat text and image conditioning separately, by proposing a unified autoregressive framework that achieves competitive performance in text-to-image generation and panorama outpainting tasks.

Recent progress in panoramic image generation has underscored two critical limitations in existing approaches. First, most methods are built upon diffusion models, which are inherently ill-suited for equirectangular projection (ERP) panoramas due to the violation of the identically and independently distributed (i.i.d.) Gaussian noise assumption caused by their spherical mapping. Second, these methods often treat text-conditioned generation (text-to-panorama) and image-conditioned generation (panorama outpainting) as separate tasks, relying on distinct architectures and task-specific data. In this work, we propose a unified framework, Panoramic AutoRegressive model (PAR), which leverages masked autoregressive modeling to address these challenges. PAR avoids the i.i.d. assumption constraint and integrates text and image conditioning into a cohesive architecture, enabling seamless generation across tasks. To address the inherent discontinuity in existing generative models, we introduce circular padding to enhance spatial coherence and propose a consistency alignment strategy to improve generation quality. Extensive experiments demonstrate competitive performance in text-to-image generation and panorama outpainting tasks while showcasing promising scalability and generalization capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes