CVJun 3

Controllable Dynamic 3D Shape Generation via 3D Trajectories and Text

arXiv:2606.0516262.2
Predicted impact top 54% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenge of generating precise 3D object motions from text by incorporating 3D trajectory control, benefiting applications in animation and virtual content creation.

T2Mo is a feed-forward framework for controllable dynamic 3D shape generation that uses 3D trajectories and text as input. It produces motions that faithfully follow given prompts with higher expressiveness while preserving motion quality, outperforming text-based and cascaded video-based baselines.

We introduce T2Mo, a feed-forward framework for controllable dynamic 3D shape generation conditioned on 3D trajectories and text. Due to the inherent ambiguity of language, generating precisely intended motions using text alone remains challenging. To address this, we adopt 3D trajectories as controllable spatial guidance, specifying the exact paths along which selected points should move. By combining both, T2Mo generates object motions that spatially adhere to the given trajectories while globally reflecting the text semantics. To robustly handle trajectory inputs with arbitrary configurations, ranging from dense to sparse and unevenly distributed, we further propose a shape-grounded trajectory embedding that maps an input trajectory set into a shape-aware token set covering the entire object. We conduct extensive comparisons against text-based baselines and cascaded video-based baselines that combine trajectory-guided video generation with video-to-dynamic mesh generation. Quantitative and qualitative evaluations, along with user studies, demonstrate that our approach produces motions that more faithfully follow the given prompts with higher expressiveness while preserving motion quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes