Face Inverse Rendering via Hierarchical Decoupling
This work addresses the challenge of making face inverse rendering more accessible and effective for common users by reducing reliance on synthetic data and professional equipment, though it is incremental in improving generalization.
The paper tackles the problem of face inverse rendering in real-world settings by proposing a deep learning framework that disentangles face images into albedo, normal, and lighting components, achieving superior performance in face relighting compared to state-of-the-art methods.
Previous face inverse rendering methods often require synthetic data with ground truth and/or professional equipment like a lighting stage. However, a model trained on synthetic data or using pre-defined lighting priors is typically unable to generalize well for real-world situations, due to the gap between synthetic data/lighting priors and real data. Furthermore, for common users, the professional equipment and skill make the task expensive and complex. In this paper, we propose a deep learning framework to disentangle face images in the wild into their corresponding albedo, normal, and lighting components. Specifically, a decomposition network is built with a hierarchical subdivision strategy, which takes image pairs captured from arbitrary viewpoints as input. In this way, our approach can greatly mitigate the pressure from data preparation, and significantly broaden the applicability of face inverse rendering. Extensive experiments are conducted to demonstrate the efficacy of our design, and show its superior performance in face relighting over other state-of-the-art alternatives. {Our code is available at \url{https://github.com/AutoHDR/HD-Net.git}}