GaussianSwap: Animatable Video Face Swapping with 3D Gaussian Splatting
This addresses the problem of creating animatable and manipulable face-swapped videos for applications in entertainment and media, representing a paradigm shift from pixel-based methods.
The paper tackles video face swapping by creating a 3D Gaussian Splatting-based avatar from a target video and transferring identity from a source image, achieving superior identity preservation, visual clarity, and temporal consistency while enabling interactive applications.
We introduce GaussianSwap, a novel video face swapping framework that constructs a 3D Gaussian Splatting based face avatar from a target video while transferring identity from a source image to the avatar. Conventional video swapping frameworks are limited to generating facial representations in pixel-based formats. The resulting swapped faces exist merely as a set of unstructured pixels without any capacity for animation or interactive manipulation. Our work introduces a paradigm shift from conventional pixel-based video generation to the creation of high-fidelity avatar with swapped faces. The framework first preprocesses target video to extract FLAME parameters, camera poses and segmentation masks, and then rigs 3D Gaussian splats to the FLAME model across frames, enabling dynamic facial control. To ensure identity preserving, we propose an compound identity embedding constructed from three state-of-the-art face recognition models for avatar finetuning. Finally, we render the face-swapped avatar on the background frames to obtain the face-swapped video. Experimental results demonstrate that GaussianSwap achieves superior identity preservation, visual clarity and temporal consistency, while enabling previously unattainable interactive applications.