CVNov 1, 2021Code
FaceScape: 3D Facial Dataset and Benchmark for Single-View 3D Face ReconstructionHao Zhu, Haotian Yang, Longwei Guo et al.
In this paper, we present a large-scale detailed 3D face dataset, FaceScape, and the corresponding benchmark to evaluate single-view facial 3D reconstruction. By training on FaceScape data, a novel algorithm is proposed to predict elaborate riggable 3D face models from a single image input. FaceScape dataset releases $16,940$ textured 3D faces, captured from $847$ subjects and each with $20$ specific expressions. The 3D models contain the pore-level facial geometry that is also processed to be topologically uniform. These fine 3D facial models can be represented as a 3D morphable model for coarse shapes and displacement maps for detailed geometry. Taking advantage of the large-scale and high-accuracy dataset, a novel algorithm is further proposed to learn the expression-specific dynamic details using a deep neural network. The learned relationship serves as the foundation of our 3D face prediction system from a single image input. Different from most previous methods, our predicted 3D models are riggable with highly detailed geometry under different expressions. We also use FaceScape data to generate the in-the-wild and in-the-lab benchmark to evaluate recent methods of single-view face reconstruction. The accuracy is reported and analyzed on the dimensions of camera pose and focal length, which provides a faithful and comprehensive evaluation and reveals new challenges. The unprecedented dataset, benchmark, and code have been released at https://github.com/zhuhao-nju/facescape.
CVOct 11, 2025
ReMix: Towards a Unified View of Consistent Character Generation and EditingBenjia Zhou, Bin Fu, Pei Cheng et al.
Recent advances in large-scale text-to-image diffusion models (e.g., FLUX.1) have greatly improved visual fidelity in consistent character generation and editing. However, existing methods rarely unify these tasks within a single framework. Generation-based approaches struggle with fine-grained identity consistency across instances, while editing-based methods often lose spatial controllability and instruction alignment. To bridge this gap, we propose ReMix, a unified framework for character-consistent generation and editing. It constitutes two core components: the ReMix Module and IP-ControlNet. The ReMix Module leverages the multimodal reasoning ability of MLLMs to edit semantic features of input images and adapt instruction embeddings to the native DiT backbone without fine-tuning. While this ensures coherent semantic layouts, pixel-level consistency and pose controllability remain challenging. To address this, IP-ControlNet extends ControlNet to decouple semantic and layout cues from reference images and introduces an ε-equivariant latent space that jointly denoises the reference and target images within a shared noise space. Inspired by convergent evolution and quantum decoherence,i.e., where environmental noise drives state convergence, this design promotes feature alignment in the hidden space, enabling consistent object generation while preserving identity. ReMix supports a wide range of tasks, including personalized generation, image editing, style transfer, and multi-condition synthesis. Extensive experiments validate its effectiveness and efficiency as a unified framework for character-consistent image generation and editing.
CVMar 31, 2020
FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face PredictionHaotian Yang, Hao Zhu, Yanru Wang et al.
In this paper, we present a large-scale detailed 3D face dataset, FaceScape, and propose a novel algorithm that is able to predict elaborate riggable 3D face models from a single image input. FaceScape dataset provides 18,760 textured 3D faces, captured from 938 subjects and each with 20 specific expressions. The 3D models contain the pore-level facial geometry that is also processed to be topologically uniformed. These fine 3D facial models can be represented as a 3D morphable model for rough shapes and displacement maps for detailed geometry. Taking advantage of the large-scale and high-accuracy dataset, a novel algorithm is further proposed to learn the expression-specific dynamic details using a deep neural network. The learned relationship serves as the foundation of our 3D face prediction system from a single image input. Different than the previous methods, our predicted 3D models are riggable with highly detailed geometry under different expressions. The unprecedented dataset and code will be released to public for research purpose.