Physically Plausible Animation of Human Upper Body from a Single Image
This enables interactive animation creation for applications like entertainment or virtual reality, but it is incremental as it builds on existing methods for 2D-to-3D simulation and pose-to-image generation.
The paper tackles the problem of generating controllable and photorealistic human animations from a single image, achieving physically plausible upper body motion through reinforcement learning that predicts 2D keypoints from 3D actions.
We present a new method for generating controllable, dynamically responsive, and photorealistic human animations. Given an image of a person, our system allows the user to generate Physically plausible Upper Body Animation (PUBA) using interaction in the image space, such as dragging their hand to various locations. We formulate a reinforcement learning problem to train a dynamic model that predicts the person's next 2D state (i.e., keypoints on the image) conditioned on a 3D action (i.e., joint torque), and a policy that outputs optimal actions to control the person to achieve desired goals. The dynamic model leverages the expressiveness of 3D simulation and the visual realism of 2D videos. PUBA generates 2D keypoint sequences that achieve task goals while being responsive to forceful perturbation. The sequences of keypoints are then translated by a pose-to-image generator to produce the final photorealistic video.