Diffusion Path Alignment for Long-Range Motion Generation and Domain Transitions
For computer vision and graphics researchers, this is the first general framework for controlled long-range motion generation with explicit transition modeling, enabling applications like dance choreography.
This paper tackles long-range human motion generation with coherent transitions across semantically distinct domains, proposing an inference-time optimization framework that regularizes transition trajectories of a pretrained diffusion model, achieving high fidelity and temporal coherence.
Long-range human movement generation remains a central challenge in computer vision and graphics. Generating coherent transitions across semantically distinct motion domains remains largely unexplored. This capability is particularly important for applications such as dance choreography, where movements must fluidly transition across diverse stylistic and semantic motifs. We propose a simple and effective inference-time optimization framework inspired by diffusion-based stochastic optimal control. Specifically, a control-energy objective that explicitly regularizes the transition trajectories of a pretrained diffusion model. We show that optimizing this objective at inference time yields transitions with fidelity and temporal coherence. This is the first work to provide a general framework for controlled long-range human motion generation with explicit transition modeling.