CVGRDec 12, 2025

WildCap: Facial Appearance Capture in the Wild via Hybrid Inverse Rendering

arXiv:2512.11237v11 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses the need for more accessible and cost-effective facial capture for applications like gaming or virtual reality, though it is an incremental improvement over existing methods.

The paper tackles the problem of high-quality facial appearance capture from smartphone videos in uncontrolled lighting, achieving results that significantly close the quality gap between in-the-wild and controlled recordings.

Existing methods achieve high-quality facial appearance capture under controllable lighting, which increases capture cost and limits usability. We propose WildCap, a novel method for high-quality facial appearance capture from a smartphone video recorded in the wild. To disentangle high-quality reflectance from complex lighting effects in in-the-wild captures, we propose a novel hybrid inverse rendering framework. Specifically, we first apply a data-driven method, i.e., SwitchLight, to convert the captured images into more constrained conditions and then adopt model-based inverse rendering. However, unavoidable local artifacts in network predictions, such as shadow-baking, are non-physical and thus hinder accurate inverse rendering of lighting and material. To address this, we propose a novel texel grid lighting model to explain non-physical effects as clean albedo illuminated by local physical lighting. During optimization, we jointly sample a diffusion prior for reflectance maps and optimize the lighting, effectively resolving scale ambiguity between local lights and albedo. Our method achieves significantly better results than prior arts in the same capture setup, closing the quality gap between in-the-wild and controllable recordings by a large margin. Our code will be released \href{https://yxuhan.github.io/WildCap/index.html}{\textcolor{magenta}{here}}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes