GaFET: Learning Geometry-aware Facial Expression Translation from In-The-Wild Images
This addresses limitations in face animation for computer vision applications, offering a more stable and generalizable method without needing videos or annotated data.
The paper tackles the problem of facial expression translation in unpaired in-the-wild images by introducing the GaFET framework, which uses parametric 3D representations and a transformer to achieve higher-quality and more accurate results compared to state-of-the-art methods, with applicability to various poses and complex textures.
While current face animation methods can manipulate expressions individually, they suffer from several limitations. The expressions manipulated by some motion-based facial reenactment models are crude. Other ideas modeled with facial action units cannot generalize to arbitrary expressions not covered by annotations. In this paper, we introduce a novel Geometry-aware Facial Expression Translation (GaFET) framework, which is based on parametric 3D facial representations and can stably decoupled expression. Among them, a Multi-level Feature Aligned Transformer is proposed to complement non-geometric facial detail features while addressing the alignment challenge of spatial features. Further, we design a De-expression model based on StyleGAN, in order to reduce the learning difficulty of GaFET in unpaired "in-the-wild" images. Extensive qualitative and quantitative experiments demonstrate that we achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures. Besides, videos or annotated training data are omitted, making our method easier to use and generalize.