CVAug 6, 2025

4DVD: Cascaded Dense-view Video Diffusion Model for High-quality 4D Content Generation

arXiv:2508.04467v11 citationsh-index: 36
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating complex 4D data for applications in dynamic 3D object modeling, though it is incremental as it builds on existing multi-view video methods.

The paper tackles the problem of generating high-quality 4D content by proposing 4DVD, a cascaded video diffusion model that decouples the task into coarse layout generation and structure-aware conditional generation, achieving state-of-the-art performance in novel view synthesis and 4D generation.

Given the high complexity of directly generating high-dimensional data such as 4D, we present 4DVD, a cascaded video diffusion model that generates 4D content in a decoupled manner. Unlike previous multi-view video methods that directly model 3D space and temporal features simultaneously with stacked cross view/temporal attention modules, 4DVD decouples this into two subtasks: coarse multi-view layout generation and structure-aware conditional generation, and effectively unifies them. Specifically, given a monocular video, 4DVD first predicts the dense view content of its layout with superior cross-view and temporal consistency. Based on the produced layout priors, a structure-aware spatio-temporal generation branch is developed, combining these coarse structural priors with the exquisite appearance content of input monocular video to generate final high-quality dense-view videos. Benefit from this, explicit 4D representation~(such as 4D Gaussian) can be optimized accurately, enabling wider practical application. To train 4DVD, we collect a dynamic 3D object dataset, called D-Objaverse, from the Objaverse benchmark and render 16 videos with 21 frames for each object. Extensive experiments demonstrate our state-of-the-art performance on both novel view synthesis and 4D generation. Our project page is https://4dvd.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes