CVJun 30, 2025

ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models

arXiv:2506.23513v11 citations
Originality Incremental advance
AI Analysis

This work solves the challenge of synthesizing realistic panoramic videos for applications in VR, world models, and spatial intelligence, representing a domain-specific advancement.

The paper tackles the problem of generating high-quality 360-degree immersive videos by addressing the modality gap between panoramic and perspective data, achieving state-of-the-art performance in panoramic video synthesis.

Panoramic video generation aims to synthesize 360-degree immersive videos, holding significant importance in the fields of VR, world models, and spatial intelligence. Existing works fail to synthesize high-quality panoramic videos due to the inherent modality gap between panoramic data and perspective data, which constitutes the majority of the training data for modern diffusion models. In this paper, we propose a novel framework utilizing pretrained perspective video models for generating panoramic videos. Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously. With our proposed Pano-Perspective attention mechanism, the model benefits from pretrained perspective priors and captures the panoramic spatial correlations of the ViewPoint map effectively. Extensive experiments demonstrate that our method can synthesize highly dynamic and spatially consistent panoramic videos, achieving state-of-the-art performance and surpassing previous methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes