CVNov 1, 2024

ZIM: Zero-Shot Image Matting for Anything

arXiv:2411.00626v29 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of precise mask generation for computer vision practitioners, offering a robust foundation for zero-shot matting and downstream applications like image inpainting and 3D NeRF, though it is incremental by building on SAM.

The paper tackles the limitation of the Segment Anything Model (SAM) in generating fine-grained precise masks by proposing ZIM, a zero-shot image matting model that outperforms existing methods in fine-grained mask generation and zero-shot generalization, as demonstrated on the MicroMat-3K test set.

The recent segmentation foundation model, Segment Anything Model (SAM), exhibits strong zero-shot segmentation capabilities, but it falls short in generating fine-grained precise masks. To address this limitation, we propose a novel zero-shot image matting model, called ZIM, with two key contributions: First, we develop a label converter that transforms segmentation labels into detailed matte labels, constructing the new SA1B-Matte dataset without costly manual annotations. Training SAM with this dataset enables it to generate precise matte masks while maintaining its zero-shot capability. Second, we design the zero-shot matting model equipped with a hierarchical pixel decoder to enhance mask representation, along with a prompt-aware masked attention mechanism to improve performance by enabling the model to focus on regions specified by visual prompts. We evaluate ZIM using the newly introduced MicroMat-3K test set, which contains high-quality micro-level matte labels. Experimental results show that ZIM outperforms existing methods in fine-grained mask generation and zero-shot generalization. Furthermore, we demonstrate the versatility of ZIM in various downstream tasks requiring precise masks, such as image inpainting and 3D NeRF. Our contributions provide a robust foundation for advancing zero-shot matting and its downstream applications across a wide range of computer vision tasks. The code is available at https://github.com/naver-ai/ZIM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes