CVJul 16, 2024

Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes

Zhi Cai, Yingjie Gao, Yaoyan Zheng, Nan Zhou, Di Huang

arXiv:2407.11464v214.722 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of obtaining extensive labels for object detection in crowded and occluded scenes, which is important for applications in computer vision, though it appears incremental as it builds on existing SAM technology.

The paper tackles the problem of object detection in crowded scenes by introducing Crowd-SAM, a framework that enhances the Segment Anything Model's performance with few learnable parameters and minimal labeled images, achieving results that rival state-of-the-art fully-supervised methods on benchmarks like CrowdHuman and CityPersons.

In computer vision, object detection is an important task that finds its application in many scenarios. However, obtaining extensive labels can be challenging, especially in crowded scenes. Recently, the Segment Anything Model (SAM) has been proposed as a powerful zero-shot segmenter, offering a novel approach to instance segmentation tasks. However, the accuracy and efficiency of SAM and its variants are often compromised when handling objects in crowded and occluded scenes. In this paper, we introduce Crowd-SAM, a SAM-based framework designed to enhance SAM's performance in crowded and occluded scenes with the cost of few learnable parameters and minimal labeled images. We introduce an efficient prompt sampler (EPS) and a part-whole discrimination network (PWD-Net), enhancing mask selection and accuracy in crowded scenes. Despite its simplicity, Crowd-SAM rivals state-of-the-art (SOTA) fully-supervised object detection methods on several benchmarks including CrowdHuman and CityPersons. Our code is available at https://github.com/FelixCaae/CrowdSAM.

View on arXiv PDF Code

Similar