ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration
This work addresses the specific issue of identity fidelity in face restoration for applications like photo enhancement, though it is incremental as it adapts existing LDM methods.
The authors tackled the problem of generating inaccurate facial appearances in blind face image restoration by proposing ReF-LDM, a latent diffusion model that uses low-quality input and multiple high-quality reference images to produce high-quality outputs, achieving improved identity preservation as demonstrated on their FFHQ-Ref dataset.
While recent works on blind face image restoration have successfully produced impressive high-quality (HQ) images with abundant details from low-quality (LQ) input images, the generated content may not accurately reflect the real appearance of a person. To address this problem, incorporating well-shot personal images as additional reference inputs could be a promising strategy. Inspired by the recent success of the Latent Diffusion Model (LDM), we propose ReF-LDM, an adaptation of LDM designed to generate HQ face images conditioned on one LQ image and multiple HQ reference images. Our model integrates an effective and efficient mechanism, CacheKV, to leverage the reference images during the generation process. Additionally, we design a timestep-scaled identity loss, enabling our LDM-based model to focus on learning the discriminating features of human faces. Lastly, we construct FFHQ-Ref, a dataset consisting of 20,405 high-quality (HQ) face images with corresponding reference images, which can serve as both training and evaluation data for reference-based face restoration models.