FT-TDR: Frequency-guided Transformer and Top-Down Refinement Network for Blind Face Inpainting
This addresses the problem of reconstructing face images without explicit mask information for applications like photo editing, but it is incremental as it builds on existing inpainting techniques.
The paper tackles blind face inpainting by proposing a two-stage method that detects corrupted regions using a transformer and frequency guidance, then refines features hierarchically to restore plausible content, achieving state-of-the-art results in qualitative and quantitative experiments.
Blind face inpainting refers to the task of reconstructing visual contents without explicitly indicating the corrupted regions in a face image. Inherently, this task faces two challenges: (1) how to detect various mask patterns of different shapes and contents; (2) how to restore visually plausible and pleasing contents in the masked regions. In this paper, we propose a novel two-stage blind face inpainting method named Frequency-guided Transformer and Top-Down Refinement Network (FT-TDR) to tackle these challenges. Specifically, we first use a transformer-based network to detect the corrupted regions to be inpainted as masks by modeling the relation among different patches. We also exploit the frequency modality as complementary information for improved detection results and capture the local contextual incoherence to enhance boundary consistency. Then a top-down refinement network is proposed to hierarchically restore features at different levels and generate contents that are semantically consistent with the unmasked face regions. Extensive experiments demonstrate that our method outperforms current state-of-the-art blind and non-blind face inpainting methods qualitatively and quantitatively.