TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks
This work addresses the problem of adapting general-purpose segmentation models to specific tasks for researchers and practitioners in computer vision, representing an incremental improvement over existing fine-tuning methods.
The authors tackled the performance gap between fine-tuned Segment-Anything Models (SAM) and domain-specific models in downstream segmentation tasks by proposing TS-SAM, which achieved competitive performance with state-of-the-art domain-specific models across ten public datasets.
Adapter based fine-tuning has been studied for improving the performance of SAM on downstream tasks. However, there is still a significant performance gap between fine-tuned SAMs and domain-specific models. To reduce the gap, we propose Two-Stream SAM (TS-SAM). On the one hand, inspired by the side network in Parameter-Efficient Fine-Tuning (PEFT), we designed a lightweight Convolutional Side Adapter (CSA), which integrates the powerful features from SAM into side network training for comprehensive feature fusion. On the other hand, in line with the characteristics of segmentation tasks, we designed Multi-scale Refinement Module (MRM) and Feature Fusion Decoder (FFD) to keep both the detailed and semantic features. Extensive experiments on ten public datasets from three tasks demonstrate that TS-SAM not only significantly outperforms the recently proposed SAM-Adapter and SSOM, but achieves competitive performance with the SOTA domain-specific models. Our code is available at: https://github.com/maoyangou147/TS-SAM.