CVApr 13

Seeing Through the Tool: A Controlled Benchmark for Occlusion Robustness in Foundation Segmentation Models

arXiv:2604.1171156.1h-index: 4
Predicted impact top 62% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For researchers and clinicians using foundation segmentation models in endoscopy, this work provides a systematic evaluation framework and reveals that model selection depends on clinical intent regarding occlusion handling.

The paper introduces OccSAM-Bench, a benchmark for evaluating occlusion robustness of SAM-family models in surgical endoscopy. It reveals two distinct model archetypes: Occluder-Aware models (SAM, SAM2, SAM3, MedSAM3) that prioritize visible tissue and reject instruments, and Occluder-Agnostic models (MedSAM, MedSAM2) that predict into occluded regions, showing that occlusion robustness varies across architectures.

Occlusion, where target structures are partially hidden by surgical instruments or overlapping tissues, remains a critical yet underexplored challenge for foundation segmentation models in clinical endoscopy. We introduce OccSAM-Bench, a benchmark designed to systematically evaluate SAM-family models under controlled, synthesized surgical occlusion. Our framework simulates two occlusion types (i.e., surgical tool overlay and cutout) across three calibrated severity levels on three public polyp datasets. We propose a novel three-region evaluation protocol that decomposes segmentation performance into full, visible-only, and invisible targets. This metric exposes behaviors that standard amodal evaluation obscures, revealing two distinct model archetypes: Occluder-Aware models (SAM, SAM 2, SAM 3, MedSAM3), which prioritize visible tissue delineation and reject instruments, and Occluder-Agnostic models (MedSAM, MedSAM2), which confidently predict into occluded regions. SAM-Med2D aligns with neither and underperforms across all conditions. Ultimately, our results demonstrate that occlusion robustness is not uniform across architectures, and model selection must be driven by specific clinical intent-whether prioritizing conservative visible-tissue segmentation or the amodal inference of hidden anatomy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes