CVFeb 13, 2023

Learning to Scale Temperature in Masked Self-Attention for Image Inpainting

arXiv:2302.06130v14 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work improves image inpainting quality for computer vision applications, though it appears incremental as it builds on existing self-attention methods.

The paper tackles image inpainting by redesigning the temperature parameter in self-attention mechanisms to address artifacts and training issues, resulting in more natural inpainting with improved perception and quantitative metrics across multiple datasets.

Recent advances in deep generative adversarial networks (GAN) and self-attention mechanism have led to significant improvements in the challenging task of inpainting large missing regions in an image. These methods integrate self-attention mechanism in neural networks to utilize surrounding neural elements based on their correlation and help the networks capture long-range dependencies. Temperature is a parameter in the Softmax function used in the self-attention, and it enables biasing the distribution of attention scores towards a handful of similar patches. Most existing self-attention mechanisms in image inpainting are convolution-based and set the temperature as a constant, performing patch matching in a limited feature space. In this work, we analyze the artifacts and training problems in previous self-attention mechanisms, and redesign the temperature learning network as well as the self-attention mechanism to address them. We present an image inpainting framework with a multi-head temperature masked self-attention mechanism, which provides stable and efficient temperature learning and uses multiple distant contextual information for high quality image inpainting. In addition to improving image quality of inpainting results, we generalize the proposed model to user-guided image editing by introducing a new sketch generation method. Extensive experiments on various datasets such as Paris StreetView, CelebA-HQ and Places2 clearly demonstrate that our method not only generates more natural inpainting results than previous works both in terms of perception image quality and quantitative metrics, but also enables to help users to generate more flexible results that are related to their sketch guidance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes