Controllable Single-shot Animation Blending with Temporal Conditioning
This work addresses a specific need for animators by providing a controllable framework for blending motions without requiring additional data or retraining, though it is incremental as it builds on existing single-shot motion generation methods.
The paper tackles the problem of blending multiple human skeletal motions within a single generative pass, which existing single-shot methods lack, by introducing a temporally conditioned framework with skeleton-aware normalization to enable smooth and controllable transitions. The result is a method that produces plausible, smooth, and controllable motion blends across various animation styles and kinematic skeletons, as demonstrated through extensive evaluations.
Training a generative model on a single human skeletal motion sequence without being bound to a specific kinematic tree has drawn significant attention from the animation community. Unlike text-to-motion generation, single-shot models allow animators to controllably generate variations of existing motion patterns without requiring additional data or extensive retraining. However, existing single-shot methods do not explicitly offer a controllable framework for blending two or more motions within a single generative pass. In this paper, we present the first single-shot motion blending framework that enables seamless blending by temporally conditioning the generation process. Our method introduces a skeleton-aware normalization mechanism to guide the transition between motions, allowing smooth, data-driven control over when and how motions blend. We perform extensive quantitative and qualitative evaluations across various animation styles and different kinematic skeletons, demonstrating that our approach produces plausible, smooth, and controllable motion blends in a unified and efficient manner.