CVJun 30, 2025

TextMesh4D: High-Quality Text-to-4D Mesh Generation

arXiv:2506.24121v12 citationsh-index: 6
AI Analysis

It addresses the challenging and largely unexplored task of text-to-4D mesh generation for content creation, offering a cost-effective solution.

The paper tackles the problem of generating dynamic 3D content (text-to-4D) from text prompts, achieving state-of-the-art results in temporal consistency, structural fidelity, and visual realism with low GPU memory usage (single 24GB GPU).

Recent advancements in diffusion generative models significantly advanced image, video, and 3D content creation from user-provided text prompts. However, the challenging problem of dynamic 3D content generation (text-to-4D) with diffusion guidance remains largely unexplored. In this paper, we introduce TextMesh4D, a novel framework for high-quality text-to-4D generation. Our approach leverages per-face Jacobians as a differentiable mesh representation and decomposes 4D generation into two stages: static object creation and dynamic motion synthesis. We further propose a flexibility-rigidity regularization term to stabilize Jacobian optimization under video diffusion priors, ensuring robust geometric performance. Experiments demonstrate that TextMesh4D achieves state-of-the-art results in terms of temporal consistency, structural fidelity, and visual realism. Moreover, TextMesh4D operates with a low GPU memory overhead-requiring only a single 24GB GPU-offering a cost-effective yet high-quality solution for text-driven 4D mesh generation. The code will be released to facilitate future research in text-to-4D generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes