CVFeb 10, 2022

NÜWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN

arXiv:2202.05009v130 citations
Originality Incremental advance
AI Analysis

This work improves image inpainting quality for applications in photo editing and content creation, though it appears incremental as it builds on existing VQGAN and sequence-to-sequence methods.

The paper tackled the problem of language-guided image inpainting by addressing issues like receptive spreading and information loss in existing models, resulting in NÜWA-LIP outperforming strong baselines on three open-domain benchmarks.

Language guided image inpainting aims to fill in the defective regions of an image under the guidance of text while keeping non-defective regions unchanged. However, the encoding process of existing models suffers from either receptive spreading of defective regions or information loss of non-defective regions, giving rise to visually unappealing inpainting results. To address the above issues, this paper proposes NÜWA-LIP by incorporating defect-free VQGAN (DF-VQGAN) with multi-perspective sequence to sequence (MP-S2S). In particular, DF-VQGAN introduces relative estimation to control receptive spreading and adopts symmetrical connections to protect information. MP-S2S further enhances visual information from complementary perspectives, including both low-level pixels and high-level tokens. Experiments show that DF-VQGAN performs more robustness than VQGAN. To evaluate the inpainting performance of our model, we built up 3 open-domain benchmarks, where NÜWA-LIP is also superior to recent strong baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes