Quickly Tuning Foundation Models for Image Segmentation
This work addresses the challenge of adapting large foundation models for specialized segmentation tasks, offering an incremental improvement in automation and efficiency for researchers and practitioners in computer vision.
The paper tackles the problem of fine-tuning foundation models like SAM for domain-specific image segmentation, which typically requires manual effort, by introducing QTT-SEG, a meta-learning approach that automates and accelerates this process, achieving consistent performance improvements over SAM's zero-shot results and surpassing AutoGluon Multimodal on most binary tasks within three minutes.
Foundation models like SAM (Segment Anything Model) exhibit strong zero-shot image segmentation performance, but often fall short on domain-specific tasks. Fine-tuning these models typically requires significant manual effort and domain expertise. In this work, we introduce QTT-SEG, a meta-learning-driven approach for automating and accelerating the fine-tuning of SAM for image segmentation. Built on the Quick-Tune hyperparameter optimization framework, QTT-SEG predicts high-performing configurations using meta-learned cost and performance models, efficiently navigating a search space of over 200 million possibilities. We evaluate QTT-SEG on eight binary and five multiclass segmentation datasets under tight time constraints. Our results show that QTT-SEG consistently improves upon SAM's zero-shot performance and surpasses AutoGluon Multimodal, a strong AutoML baseline, on most binary tasks within three minutes. On multiclass datasets, QTT-SEG delivers consistent gains as well. These findings highlight the promise of meta-learning in automating model adaptation for specialized segmentation tasks. Code available at: https://github.com/ds-brx/QTT-SEG/