CVIVJul 24, 2022

Semantic-guided Multi-Mask Image Harmonization

arXiv:2207.11722v119 citationsh-index: 32Has Code
Originality Incremental advance
AI Analysis

This addresses image harmonization for complex scenes with multiple semantic regions without masks, though it appears incremental as it extends single-mask methods to multi-mask scenarios.

The paper tackles the problem of harmonizing multiple foregrounds pasted from different images without input masks by proposing a semantic-guided multi-mask image harmonization task, constructing two benchmarks (HScene with 150 classes and HLIP with 19 classes), and showing that an operator mask-based network improves state-of-the-art methods when perturbations are structural.

Previous harmonization methods focus on adjusting one inharmonious region in an image based on an input mask. They may face problems when dealing with different perturbations on different semantic regions without available input masks. To deal with the problem that one image has been pasted with several foregrounds coming from different images and needs to harmonize them towards different domain directions without any mask as input, we propose a new semantic-guided multi-mask image harmonization task. Different from the previous single-mask image harmonization task, each inharmonious image is perturbed with different methods according to the semantic segmentation masks. Two challenging benchmarks, HScene and HLIP, are constructed based on $150$ and $19$ semantic classes, respectively. Furthermore, previous baselines focus on regressing the exact value for each pixel of the harmonized images. The generated results are in the `black box' and cannot be edited. In this work, we propose a novel way to edit the inharmonious images by predicting a series of operator masks. The masks indicate the level and the position to apply a certain image editing operation, which could be the brightness, the saturation, and the color in a specific dimension. The operator masks provide more flexibility for users to edit the image further. Extensive experiments verify that the operator mask-based network can further improve those state-of-the-art methods which directly regress RGB images when the perturbations are structural. Experiments have been conducted on our constructed benchmarks to verify that our proposed operator mask-based framework can locate and modify the inharmonious regions in more complex scenes. Our code and models are available at https://github.com/XuqianRen/Semantic-guided-Multi-mask-Image-Harmonization.git.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes