Vid2Game: Controllable Characters Extracted from Real-World Videos
This addresses the need for creating realistic, controllable character animations from real-world videos, which is useful for applications in entertainment and simulation, but is incremental as it builds on existing video generation and pose estimation methods.
The paper tackles the problem of generating controllable video sequences of a person from a single input video, enabling novel image sequences with arbitrary backgrounds and user-defined control signals for body displacement, and demonstrates high-quality performance on dancers and athletes.
We are given a video of a person performing a certain activity, from which we extract a controllable model. The model generates novel image sequences of that person, according to arbitrary user-defined control signals, typically marking the displacement of the moving body. The generated video can have an arbitrary background, and effectively capture both the dynamics and appearance of the person. The method is based on two networks. The first network maps a current pose, and a single-instance control signal to the next pose. The second network maps the current pose, the new pose, and a given background, to an output frame. Both networks include multiple novelties that enable high-quality performance. This is demonstrated on multiple characters extracted from various videos of dancers and athletes.