CVAIFeb 24, 2024

TV-SAM: Increasing Zero-Shot Segmentation Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation

arXiv:2402.15759v221 citationsh-index: 17Big Data Min Anal
Originality Incremental advance
AI Analysis

This addresses the problem of segmenting unseen targets in medical images without manual annotation for researchers and clinicians, though it is incremental as it builds on existing models like SAM.

The study tackled zero-shot segmentation on multimodal medical images by developing TV-SAM, which uses GPT-4 to generate descriptive prompts without human annotation, resulting in performance closely matching SAM BBOX with gold standard prompts and surpassing state-of-the-art methods on datasets like ISIC and WBC.

This study presents a novel multimodal medical image zero-shot segmentation algorithm named the text-visual-prompt segment anything model (TV-SAM) without any manual annotations. The TV-SAM incorporates and integrates the large language model GPT-4, the vision language model GLIP, and the SAM to autonomously generate descriptive text prompts and visual bounding box prompts from medical images, thereby enhancing the SAM's capability for zero-shot segmentation. Comprehensive evaluations are implemented on seven public datasets encompassing eight imaging modalities to demonstrate that TV-SAM can effectively segment unseen targets across various modalities without additional training. TV-SAM significantly outperforms SAM AUTO and GSAM, closely matching the performance of SAM BBOX with gold standard bounding box prompts and surpasses the state-of-the-art methods on specific datasets such as ISIC and WBC. The study indicates that TV-SAM serves as an effective multimodal medical image zero-shot segmentation algorithm, highlighting the significant contribution of GPT-4 to zero-shot segmentation. By integrating foundational models such as GPT-4, GLIP, and SAM, the ability to address complex problems in specialized domains can be enhanced.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes