Task-driven Prompt Evolution for Foundation Models
This work addresses the performance gap for medical imaging applications using foundation models, representing an incremental advancement in automatic visual prompt-tuning.
The paper tackles the problem of underperforming promptable foundation models like SAM in medical image segmentation by proposing a plug-and-play prompt optimization technique (SAMPOT) that uses downstream tasks to improve prompts, resulting in an improvement in approximately 75% of cases for lung segmentation in chest X-rays.
Promptable foundation models, particularly Segment Anything Model (SAM), have emerged as a promising alternative to the traditional task-specific supervised learning for image segmentation. However, many evaluation studies have found that their performance on medical imaging modalities to be underwhelming compared to conventional deep learning methods. In the world of large pre-trained language and vision-language models, learning prompt from downstream tasks has achieved considerable success in improving performance. In this work, we propose a plug-and-play Prompt Optimization Technique for foundation models like SAM (SAMPOT) that utilizes the downstream segmentation task to optimize the human-provided prompt to obtain improved performance. We demonstrate the utility of SAMPOT on lung segmentation in chest X-ray images and obtain an improvement on a significant number of cases ($\sim75\%$) over human-provided initial prompts. We hope this work will lead to further investigations in the nascent field of automatic visual prompt-tuning.