CVFeb 25

MultiAnimate: Pose-Guided Image Animation Made Extensible

arXiv:2602.21581v1h-index: 17
Originality Highly original
AI Analysis

This addresses the challenge of realistic multi-character video synthesis for applications like animation and virtual reality, representing a novel extension beyond single-character methods.

The paper tackles the problem of multi-character pose-guided image animation, where naive extensions of single-character methods cause identity confusion and implausible occlusions, and proposes a framework using Diffusion Transformers with novel components that achieves state-of-the-art performance, generalizing to more characters than seen in training.

Pose-guided human image animation aims to synthesize realistic videos of a reference character driven by a sequence of poses. While diffusion-based methods have achieved remarkable success, most existing approaches are limited to single-character animation. We observe that naively extending these methods to multi-character scenarios often leads to identity confusion and implausible occlusions between characters. To address these challenges, in this paper, we propose an extensible multi-character image animation framework built upon modern Diffusion Transformers (DiTs) for video generation. At its core, our framework introduces two novel components-Identifier Assigner and Identifier Adapter - which collaboratively capture per-person positional cues and inter-person spatial relationships. This mask-driven scheme, along with a scalable training strategy, not only enhances flexibility but also enables generalization to scenarios with more characters than those seen during training. Remarkably, trained on only a two-character dataset, our model generalizes to multi-character animation while maintaining compatibility with single-character cases. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in multi-character image animation, surpassing existing diffusion-based baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes