Quantifying the Limits of Segmentation Foundation Models: Modeling Challenges in Segmenting Tree-Like and Low-Contrast Objects
This identifies crucial failure modes for real-world applications of segmentation foundation models, though it's primarily diagnostic rather than offering solutions.
The paper systematically studied why image segmentation foundation models struggle with tree-like objects and low-contrast textures, showing performance correlates with interpretable metrics for these factors and that fine-tuning doesn't resolve this fundamental limitation.
Image segmentation foundation models (SFMs) like Segment Anything Model (SAM) have achieved impressive zero-shot and interactive segmentation across diverse domains. However, they struggle to segment objects with certain structures, particularly those with dense, tree-like morphology and low textural contrast from their surroundings. These failure modes are crucial for understanding the limitations of SFMs in real-world applications. To systematically study this issue, we introduce interpretable metrics quantifying object tree-likeness and textural separability. On carefully controlled synthetic experiments and real-world datasets, we show that SFM performance (\eg, SAM, SAM 2, HQ-SAM) noticeably correlates with these factors. We attribute these failures to SFMs misinterpreting local structure as global texture, resulting in over-segmentation or difficulty distinguishing objects from similar backgrounds. Notably, targeted fine-tuning fails to resolve this issue, indicating a fundamental limitation. Our study provides the first quantitative framework for modeling the behavior of SFMs on challenging structures, offering interpretable insights into their segmentation capabilities.