CVJan 19, 2024

Learning Position-Aware Implicit Neural Network for Real-World Face Inpainting

arXiv:2401.10537v15 citationsPattern Recognition
Originality Incremental advance
AI Analysis

This addresses the challenge of realistic face inpainting for applications in photo editing and computer vision, though it appears incremental as it builds on existing deep learning backbones with specific enhancements for position modeling.

The paper tackles the problem of face inpainting for arbitrary-shaped images in real-world scenarios, where existing methods produce visually unpleasant results in position-sensitive details like eyes and nose, and proposes an Implicit Neural Inpainting Network (IN^2) that explicitly models position information, achieving superior performance as demonstrated in extensive experiments.

Face inpainting requires the model to have a precise global understanding of the facial position structure. Benefiting from the powerful capabilities of deep learning backbones, recent works in face inpainting have achieved decent performance in ideal setting (square shape with $512px$). However, existing methods often produce a visually unpleasant result, especially in the position-sensitive details (e.g., eyes and nose), when directly applied to arbitrary-shaped images in real-world scenarios. The visually unpleasant position-sensitive details indicate the shortcomings of existing methods in terms of position information processing capability. In this paper, we propose an \textbf{I}mplicit \textbf{N}eural \textbf{I}npainting \textbf{N}etwork (IN$^2$) to handle arbitrary-shape face images in real-world scenarios by explicit modeling for position information. Specifically, a downsample processing encoder is proposed to reduce information loss while obtaining the global semantic feature. A neighbor hybrid attention block is proposed with a hybrid attention mechanism to improve the facial understanding ability of the model without restricting the shape of the input. Finally, an implicit neural pyramid decoder is introduced to explicitly model position information and bridge the gap between low-resolution features and high-resolution output. Extensive experiments demonstrate the superiority of the proposed method in real-world face inpainting task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes