CVJun 9, 2025

Drive Any Mesh: 4D Latent Diffusion for Mesh Deformation from Video

arXiv:2506.07489v110 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the need for efficient mesh animation in gaming and filming industries, offering a method that avoids manual effort and improves rendering compatibility compared to existing techniques.

The authors tackled the problem of generating mesh animations from monocular video, developing a 4D diffusion model that produces high-quality animations compatible with modern rendering engines, achieving rapid generation of complex motions.

We propose DriveAnyMesh, a method for driving mesh guided by monocular video. Current 4D generation techniques encounter challenges with modern rendering engines. Implicit methods have low rendering efficiency and are unfriendly to rasterization-based engines, while skeletal methods demand significant manual effort and lack cross-category generalization. Animating existing 3D assets, instead of creating 4D assets from scratch, demands a deep understanding of the input's 3D structure. To tackle these challenges, we present a 4D diffusion model that denoises sequences of latent sets, which are then decoded to produce mesh animations from point cloud trajectory sequences. These latent sets leverage a transformer-based variational autoencoder, simultaneously capturing 3D shape and motion information. By employing a spatiotemporal, transformer-based diffusion model, information is exchanged across multiple latent frames, enhancing the efficiency and generalization of the generated results. Our experimental results demonstrate that DriveAnyMesh can rapidly produce high-quality animations for complex motions and is compatible with modern rendering engines. This method holds potential for applications in both the gaming and filming industries.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes