CVDec 28, 2023

Amodal Ground Truth and Completion in the Wild

Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman

arXiv:2312.17247v222.254 citationsh-index: 49Has CodeCVPR

Originality Incremental advance

AI Analysis

This work addresses the challenge of accurate amodal segmentation for computer vision applications, offering a more objective benchmark and improved models, though it is incremental in advancing existing methods.

The paper tackles the problem of subjective ground truth in amodal image segmentation by introducing an automatic pipeline using 3D data to create authentic masks for occluded objects, resulting in a new state-of-the-art performance on datasets like COCOA and their new MP3D-Amodal benchmark.

This paper studies amodal image segmentation: predicting entire object segmentation masks including both visible and invisible (occluded) parts. In previous work, the amodal segmentation ground truth on real images is usually predicted by manual annotaton and thus is subjective. In contrast, we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images. This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels. To better handle the amodal completion task in the wild, we explore two architecture variants: a two-stage model that first infers the occluder, followed by amodal mask completion; and a one-stage model that exploits the representation power of Stable Diffusion for amodal segmentation across many categories. Without bells and whistles, our method achieves a new state-of-the-art performance on Amodal segmentation datasets that cover a large variety of objects, including COCOA and our new MP3D-Amodal dataset. The dataset, model, and code are available at https://www.robots.ox.ac.uk/~vgg/research/amodal/.

View on arXiv PDF Code

Similar