Omni-ID: Holistic Identity Representation Designed for Generative Tasks
This addresses the need for better identity preservation in facial generation for applications like animation or virtual avatars, though it appears incremental as it builds on existing generative frameworks.
The paper tackles the problem of facial identity representation for generative tasks by introducing Omni-ID, a holistic representation that encodes appearance across expressions and poses. The approach achieves substantial improvements over conventional representations like CLIP and ArcFace on various generative tasks.
We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. It consolidates information from a varied number of unstructured input images into a structured representation, where each entry represents certain global or local identity features. Our approach uses a few-to-many identity reconstruction training paradigm, where a limited set of input images is used to reconstruct multiple target images of the same individual in various poses and expressions. A multi-decoder framework is further employed to leverage the complementary strengths of diverse decoders during training. Unlike conventional representations, such as CLIP and ArcFace, which are typically learned through discriminative or contrastive objectives, Omni-ID is optimized with a generative objective, resulting in a more comprehensive and nuanced identity capture for generative tasks. Trained on our MFHQ dataset -- a multi-view facial image collection, Omni-ID demonstrates substantial improvements over conventional representations across various generative tasks.