CVJan 13

SAM-pose2seg: Pose-Guided Human Instance Segmentation in Crowds

arXiv:2601.08982v11 citations
Originality Incremental advance
AI Analysis

This work addresses occlusion challenges in human segmentation for applications like surveillance or robotics, but it is incremental as it builds on the existing SAM foundation with minor modifications.

The paper tackles the problem of human instance segmentation in crowded scenes with occlusion by adapting Segment Anything (SAM) 2.1 with pose guidance, achieving improved robustness and accuracy across multiple datasets, including accurate mask prediction from as few as a single keypoint.

Segment Anything (SAM) provides an unprecedented foundation for human segmentation, but may struggle under occlusion, where keypoints may be partially or fully invisible. We adapt SAM 2.1 for pose-guided segmentation with minimal encoder modifications, retaining its strong generalization. Using a fine-tuning strategy called PoseMaskRefine, we incorporate pose keypoints with high visibility into the iterative correction process originally employed by SAM, yielding improved robustness and accuracy across multiple datasets. During inference, we simplify prompting by selecting only the three keypoints with the highest visibility. This strategy reduces sensitivity to common errors, such as missing body parts or misclassified clothing, and allows accurate mask prediction from as few as a single keypoint. Our results demonstrate that pose-guided fine-tuning of SAM enables effective, occlusion-aware human segmentation while preserving the generalization capabilities of the original model. The code and pretrained models will be available at https://mirapurkrabek.github.io/BBox-MaskPose.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes