CVFeb 12

LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts

arXiv:2602.11564v18 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the problem of generating high-quality ultra-high-resolution videos for applications in media and AI, representing an incremental improvement over existing video diffusion models.

The paper tackles the challenge of ultra-high-resolution video generation by proposing LUVE, a latent-cascaded framework with dual frequency experts, achieving superior photorealism and content fidelity in experiments.

Recent advances in video diffusion models have significantly improved visual quality, yet ultra-high-resolution (UHR) video generation remains a formidable challenge due to the compounded difficulties of motion modeling, semantic planning, and detail synthesis. To address these limitations, we propose \textbf{LUVE}, a \textbf{L}atent-cascaded \textbf{U}HR \textbf{V}ideo generation framework built upon dual frequency \textbf{E}xperts. LUVE employs a three-stage architecture comprising low-resolution motion generation for motion-consistent latent synthesis, video latent upsampling that performs resolution upsampling directly in the latent space to mitigate memory and computational overhead, and high-resolution content refinement that integrates low-frequency and high-frequency experts to jointly enhance semantic coherence and fine-grained detail generation. Extensive experiments demonstrate that our LUVE achieves superior photorealism and content fidelity in UHR video generation, and comprehensive ablation studies further validate the effectiveness of each component. The project is available at \href{https://unicornanrocinu.github.io/LUVE_web/}{https://github.io/LUVE/}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes