RigAnyFace: Scaling Neural Facial Mesh Auto-Rigging with Unlabeled Data
This work addresses the challenge of scalable facial animation rigging for the computer graphics industry, offering a solution that reduces reliance on costly manual rigging and supports more detailed expressions, though it is incremental in advancing neural auto-rigging techniques.
The paper tackles the problem of automatically rigging facial meshes of diverse topologies, including those with multiple disconnected components, by introducing RigAnyFace (RAF), a neural framework that deforms neutral meshes into expressive blendshape poses using a triangulation-agnostic network and a 2D supervision strategy for unlabeled data, resulting in improved accuracy and generalizability over previous methods.
In this paper, we present RigAnyFace (RAF), a scalable neural auto-rigging framework for facial meshes of diverse topologies, including those with multiple disconnected components. RAF deforms a static neutral facial mesh into industry-standard FACS poses to form an expressive blendshape rig. Deformations are predicted by a triangulation-agnostic surface learning network augmented with our tailored architecture design to condition on FACS parameters and efficiently process disconnected components. For training, we curated a dataset of facial meshes, with a subset meticulously rigged by professional artists to serve as accurate 3D ground truth for deformation supervision. Due to the high cost of manual rigging, this subset is limited in size, constraining the generalization ability of models trained exclusively on it. To address this, we design a 2D supervision strategy for unlabeled neutral meshes without rigs. This strategy increases data diversity and allows for scaled training, thereby enhancing the generalization ability of models trained on this augmented data. Extensive experiments demonstrate that RAF is able to rig meshes of diverse topologies on not only our artist-crafted assets but also in-the-wild samples, outperforming previous works in accuracy and generalizability. Moreover, our method advances beyond prior work by supporting multiple disconnected components, such as eyeballs, for more detailed expression animation. Project page: https://wenchao-m.github.io/RigAnyFace.github.io