CVMay 13, 2023

AURA : Automatic Mask Generator using Randomized Input Sampling for Object Removal

arXiv:2305.07857v2
Originality Incremental advance
AI Analysis

This work addresses the specific challenge of improving object removal in images for computer vision applications, representing an incremental advancement by focusing on mask generation rather than developing a new inpainting method.

The paper tackles the problem of generating optimal input masks for object removal using existing image inpainting networks, proposing an automatic mask generator that outperforms semantic segmentation masks and introduces new evaluation metrics (FID* and U-IDS*) that align with human judgment.

The objective of the image inpainting task is to fill missing regions of an image in a visually plausible way. Recently, deep-learning-based image inpainting networks have generated outstanding results, and some utilize their models as object removers by masking unwanted objects in an image. However, while trying to better remove objects using their networks, the previous works pay less attention to the importance of the input mask. In this paper, we focus on generating the input mask to better remove objects using the off-the-shelf image inpainting network. We propose an automatic mask generator inspired by the explainable AI (XAI) method, whose output can better remove objects than a semantic segmentation mask. The proposed method generates an importance map using randomly sampled input masks and quantitatively estimated scores of the completed images obtained from the random masks. The output mask is selected by a judge module among the candidate masks which are generated from the importance map. We design the judge module to quantitatively estimate the quality of the object removal results. In addition, we empirically find that the evaluation methods used in the previous works reporting object removal results are not appropriate for estimating the performance of an object remover. Therefore, we propose new evaluation metrics (FID$^*$ and U-IDS$^*$) to properly evaluate the quality of object removers. Experiments confirm that our method shows better performance in removing target class objects than the masks generated from the semantic segmentation maps, and the two proposed metrics make judgments consistent with humans.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes