CVJul 3, 2024

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, Di Zhang

arXiv:2407.03168v239.6217 citationsh-index: 21Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for practical, controllable, and efficient portrait animation for applications in video generation and editing, though it is incremental as it builds upon existing implicit-keypoint methods.

The paper tackles the problem of portrait animation by developing LivePortrait, an implicit-keypoint-based framework that achieves efficient video synthesis from a single source image with motion from a driving video, reaching a generation speed of 12.8ms on an RTX 4090 GPU.

Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability. Building upon this, we develop a video-driven portrait animation framework named LivePortrait with a focus on better generalization, controllability, and efficiency for practical usage. To enhance the generation quality and generalization ability, we scale up the training data to about 69 million high-quality frames, adopt a mixed image-video training strategy, upgrade the network architecture, and design better motion transformation and optimization objectives. Additionally, we discover that compact implicit keypoints can effectively represent a kind of blendshapes and meticulously propose a stitching and two retargeting modules, which utilize a small MLP with negligible computational overhead, to enhance the controllability. Experimental results demonstrate the efficacy of our framework even compared to diffusion-based methods. The generation speed remarkably reaches 12.8ms on an RTX 4090 GPU with PyTorch. The inference code and models are available at https://github.com/KwaiVGI/LivePortrait

View on arXiv PDF Code

Similar