Motion Representations for Articulated Animation
This work addresses the challenge of generating realistic animations for articulated objects, which is important for applications in computer graphics and video editing, though it builds incrementally on prior keypoint-based approaches.
The paper tackles the problem of animating articulated objects by proposing novel motion representations that identify object parts and track their motions in an unsupervised manner, achieving 96.6% user preference over state-of-the-art methods on a new benchmark.
We propose novel motion representations for animating articulated objects consisting of distinct parts. In a completely unsupervised manner, our method identifies object parts, tracks them in a driving video, and infers their motions by considering their principal axes. In contrast to the previous keypoint-based works, our method extracts meaningful and consistent regions, describing locations, shape, and pose. The regions correspond to semantically relevant and distinct object parts, that are more easily detected in frames of the driving video. To force decoupling of foreground from background, we model non-object related global motion with an additional affine transformation. To facilitate animation and prevent the leakage of the shape of the driving object, we disentangle shape and pose of objects in the region space. Our model can animate a variety of objects, surpassing previous methods by a large margin on existing benchmarks. We present a challenging new benchmark with high-resolution videos and show that the improvement is particularly pronounced when articulated objects are considered, reaching 96.6% user preference vs. the state of the art.