CVAIMar 11

BALD-SAM: Disagreement-based Active Prompting in Interactive Segmentation

arXiv:2603.10828v110.3h-index: 14
Predicted impact top 66% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenge of efficient annotation workflows in interactive segmentation for domains like medical and seismic imaging, though it is incremental as it builds on existing SAM and active learning methods.

The paper tackles the problem of selecting optimal spatial prompts for interactive segmentation with the Segment Anything Model (SAM) by proposing BALD-SAM, a Bayesian active learning framework that identifies informative regions for prompting. The method achieves strong cross-domain performance, ranking first or second on 14 of 16 benchmarks and surpassing human prompting in several categories.

The Segment Anything Model (SAM) has revolutionized interactive segmentation through spatial prompting. While existing work primarily focuses on automating prompts in various settings, real-world annotation workflows involve iterative refinement where annotators observe model outputs and strategically place prompts to resolve ambiguities. Current pipelines typically rely on the annotator's visual assessment of the predicted mask quality. We postulate that a principled approach for automated interactive prompting is to use a model-derived criterion to identify the most informative region for the next prompt. In this work, we establish active prompting: a spatial active learning approach where locations within images constitute an unlabeled pool and prompts serve as queries to prioritize information-rich regions, increasing the utility of each interaction. We further present BALD-SAM: a principled framework adapting Bayesian Active Learning by Disagreement (BALD) to spatial prompt selection by quantifying epistemic uncertainty. To do so, we freeze the entire model and apply Bayesian uncertainty modeling only to a small learned prediction head, making intractable uncertainty estimation practical for large multi-million parameter foundation models. Across 16 datasets spanning natural, medical, underwater, and seismic domains, BALD-SAM demonstrates strong cross-domain performance, ranking first or second on 14 of 16 benchmarks. We validate these gains through a comprehensive ablation suite covering 3 SAM backbones and 35 Laplace posterior configurations, amounting to 38 distinct ablation settings. Beyond strong average performance, BALD-SAM surpasses human prompting and, in several categories, even oracle prompting, while consistently outperforming one-shot baselines in final segmentation quality, particularly on thin and structurally complex objects.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes