CVJun 30, 2025

TextMesh4D: High-Quality Text-to-4D Mesh Generation

Sisi Dai, Xinxin Su, Boyan Wan, Ruizhen Hu, Kai Xu

arXiv:2506.24121v111.82 citationsh-index: 6

Originality Highly original

AI Analysis

It addresses the challenging and largely unexplored task of text-to-4D mesh generation for content creation, offering a cost-effective solution.

The paper tackles the problem of generating dynamic 3D content (text-to-4D) from text prompts, achieving state-of-the-art results in temporal consistency, structural fidelity, and visual realism with low GPU memory usage (single 24GB GPU).

Recent advancements in diffusion generative models significantly advanced image, video, and 3D content creation from user-provided text prompts. However, the challenging problem of dynamic 3D content generation (text-to-4D) with diffusion guidance remains largely unexplored. In this paper, we introduce TextMesh4D, a novel framework for high-quality text-to-4D generation. Our approach leverages per-face Jacobians as a differentiable mesh representation and decomposes 4D generation into two stages: static object creation and dynamic motion synthesis. We further propose a flexibility-rigidity regularization term to stabilize Jacobian optimization under video diffusion priors, ensuring robust geometric performance. Experiments demonstrate that TextMesh4D achieves state-of-the-art results in terms of temporal consistency, structural fidelity, and visual realism. Moreover, TextMesh4D operates with a low GPU memory overhead-requiring only a single 24GB GPU-offering a cost-effective yet high-quality solution for text-driven 4D mesh generation. The code will be released to facilitate future research in text-to-4D generation.

View on arXiv PDF

Similar