CVJul 21, 2024
HoloDreamer: Holistic 3D Panoramic World Generation from Text DescriptionsHaiyang Zhou, Xinhua Cheng, Wangbo Yu et al.
3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry. Owing to the powerful generative capabilities of text-to-image diffusion models that provide reliable priors, the creation of 3D scenes using only text prompts has become viable, thereby significantly advancing researches in text-driven 3D scene generation. In order to obtain multiple-view supervision from 2D diffusion models, prevailing methods typically employ the diffusion model to generate an initial local image, followed by iteratively outpainting the local image using diffusion models to gradually generate scenes. Nevertheless, these outpainting-based approaches prone to produce global inconsistent scene generation results without high degree of completeness, restricting their broader applications. To tackle these problems, we introduce HoloDreamer, a framework that first generates high-definition panorama as a holistic initialization of the full 3D scene, then leverage 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation of view-consistent and fully enclosed 3D scenes. Specifically, we propose Stylized Equirectangular Panorama Generation, a pipeline that combines multiple diffusion models to enable stylized and detailed equirectangular panorama generation from complex text prompts. Subsequently, Enhanced Two-Stage Panorama Reconstruction is introduced, conducting a two-stage optimization of 3D-GS to inpaint the missing region and enhance the integrity of the scene. Comprehensive experiments demonstrated that our method outperforms prior works in terms of overall visual consistency and harmony as well as reconstruction quality and rendering robustness when generating fully enclosed scenes.
79.5NAApr 30
Fourier Analysis of Finite Difference Schemes for the Helmholtz Equation in 1D with Dirichlet Conditions: Sharp Estimates and Relative ErrorsMartin J. Gander, Hui Zhang, Haiyang Zhou
We consider the Dirichlet problem of the indefinite Helmholtz equation in 1D, $u''+k^2u=f$ in $(0,1)$, $u(0)=g_0$, $u(1)=g_1$, with a constant wavenumber $k\in(0,\infty)\backslashπ\mathbb{N}$ and a source term $f\in H^p_0(0,1)$, $p\ge 4$. We propose an approach based on Fourier analysis to derive wavenumber explicit sharp estimates of absolute and relative errors of \emph{finite difference} methods. Such results have been well known for \emph{finite element} methods (FEM). We use the approach to analyze the classical centered finite difference scheme. For the Fourier interpolants of the discrete solution with homogeneous (or inhomogeneous) Dirichlet conditions, we show rigorously, under the two assumptions $k>20$ and $k(kh)^2/σ_k\le4/(π-2)$ with $σ_k:=\operatorname{dist}(k,π\mathbb{N})$, that the worst case attainable convergence order of the absolute error with $\sum_{p=0}^4k^{-p}\|f^{(p)}\|_{L^2}=O(1)$ (or $|g_i|\asymp k^{-1}$) is $(kh)^2/σ_k^2$ in the $L^2$-norm and $k(kh)^2/σ_k^2$ in the $H^1$-semi-norm, and that of the relative error is $k(kh)^2/σ_k$ in both $L^2$- and $H^1$-semi-norms if $\|u^{(p)}\|_{L^2}/\|u^{(p-2)}\|_{L^2}\asymp k^2$ for $p=2,3$. In particular, the lower bounds of these error estimates are established rigorously in the same orders as the upper bounds, which is the main novelty of this work. We show also that the Fourier analysis approach can be used as a convenient visual tool for evaluating finite difference schemes in presence of source terms, which is beyond the scope of dispersion analysis. The results from the theory and visual analysis are corroborated by numerical experiments.
CVApr 30, 2025
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene GenerationHaiyang Zhou, Wangbo Yu, Jiawen Guan et al.
The rapid advancement of diffusion models holds the promise of revolutionizing the application of VR and AR technologies, which typically require scene-level 4D assets for user experience. Nonetheless, existing diffusion models predominantly concentrate on modeling static 3D scenes or object-level dynamics, constraining their capacity to provide truly immersive experiences. To address this issue, we propose HoloTime, a framework that integrates video diffusion models to generate panoramic videos from a single prompt or reference image, along with a 360-degree 4D scene reconstruction method that seamlessly transforms the generated panoramic video into 4D assets, enabling a fully immersive 4D experience for users. Specifically, to tame video diffusion models for generating high-fidelity panoramic videos, we introduce the 360World dataset, the first comprehensive collection of panoramic videos suitable for downstream 4D scene reconstruction tasks. With this curated dataset, we propose Panoramic Animator, a two-stage image-to-video diffusion model that can convert panoramic images into high-quality panoramic videos. Following this, we present Panoramic Space-Time Reconstruction, which leverages a space-time depth estimation method to transform the generated panoramic videos into 4D point clouds, enabling the optimization of a holistic 4D Gaussian Splatting representation to reconstruct spatially and temporally consistent 4D scenes. To validate the efficacy of our method, we conducted a comparative analysis with existing approaches, revealing its superiority in both panoramic video generation and 4D scene reconstruction. This demonstrates our method's capability to create more engaging and realistic immersive environments, thereby enhancing user experiences in VR and AR applications.