CVLGJul 31, 2025

SAM-PTx: Text-Guided Fine-Tuning of SAM with Parameter-Efficient, Parallel-Text Adapters

arXiv:2508.00213v11 citationsh-index: 1IEEE Access
Originality Incremental advance
AI Analysis

This work addresses the problem of enhancing semantic guidance in segmentation models for computer vision researchers, though it is incremental as it builds on existing SAM and CLIP architectures.

The paper tackled the underexplored use of semantic text prompts in the Segment Anything Model (SAM) by introducing SAM-PTx, a parameter-efficient adapter that integrates frozen CLIP-derived text embeddings to guide segmentation, resulting in improved performance over spatial prompt baselines on datasets like COD10K, COCO, and ADE20K.

The Segment Anything Model (SAM) has demonstrated impressive generalization in prompt-based segmentation. Yet, the potential of semantic text prompts remains underexplored compared to traditional spatial prompts like points and boxes. This paper introduces SAM-PTx, a parameter-efficient approach for adapting SAM using frozen CLIP-derived text embeddings as class-level semantic guidance. Specifically, we propose a lightweight adapter design called Parallel-Text that injects text embeddings into SAM's image encoder, enabling semantics-guided segmentation while keeping most of the original architecture frozen. Our adapter modifies only the MLP-parallel branch of each transformer block, preserving the attention pathway for spatial reasoning. Through supervised experiments and ablations on the COD10K dataset as well as low-data subsets of COCO and ADE20K, we show that incorporating fixed text embeddings as input improves segmentation performance over purely spatial prompt baselines. To our knowledge, this is the first work to use text prompts for segmentation on the COD10K dataset. These results suggest that integrating semantic conditioning into SAM's architecture offers a practical and scalable path for efficient adaptation with minimal computational complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes