Can Pose Transfer Models Generate Realistic Human Motion?
This study addresses the problem of assessing the realism and consistency of pose-transfer models for human motion generation, which is crucial for applications like animation and virtual reality, but it is incremental as it focuses on evaluation rather than proposing new methods.
The paper evaluated three state-of-the-art pose-transfer methods (AnimateAnyone, MagicAnimate, ExAvatar) by generating videos with out-of-distribution actions and identities, finding that participants correctly identified the desired action only 42.92% of the time and found consistency with reference videos only 36.46% of the time.
Recent pose-transfer methods aim to generate temporally consistent and fully controllable videos of human action where the motion from a reference video is reenacted by a new identity. We evaluate three state-of-the-art pose-transfer methods -- AnimateAnyone, MagicAnimate, and ExAvatar -- by generating videos with actions and identities outside the training distribution and conducting a participant study about the quality of these videos. In a controlled environment of 20 distinct human actions, we find that participants, presented with the pose-transferred videos, correctly identify the desired action only 42.92% of the time. Moreover, the participants find the actions in the generated videos consistent with the reference (source) videos only 36.46% of the time. These results vary by method: participants find the splatting-based ExAvatar more consistent and photorealistic than the diffusion-based AnimateAnyone and MagicAnimate.