CVAug 12, 2024

Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes

arXiv:2408.05936v19 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the challenge of fine-tuning large models for domain-specific segmentation tasks where data is scarce, representing an incremental advance in adaptor learning techniques.

The paper tackles the problem of adapting large foundational vision models like SAM to specialized tasks with limited data by introducing a novel multi-scale contrastive adaptor learning method, achieving substantial performance improvements such as a 20.0% MAE boost on COD10K and 15.4% BER improvement on ISTD.

Foundational vision models, such as the Segment Anything Model (SAM), have achieved significant breakthroughs through extensive pre-training on large-scale visual datasets. Despite their general success, these models may fall short in specialized tasks with limited data, and fine-tuning such large-scale models is often not feasible. Current strategies involve incorporating adaptors into the pre-trained SAM to facilitate downstream task performance with minimal model adjustment. However, these strategies can be hampered by suboptimal learning approaches for the adaptors. In this paper, we introduce a novel Multi-scale Contrastive Adaptor learning method named MCA-SAM, which enhances adaptor performance through a meticulously designed contrastive learning framework at both token and sample levels. Our Token-level Contrastive adaptor (TC-adaptor) focuses on refining local representations by improving the discriminability of patch tokens, while the Sample-level Contrastive adaptor (SC-adaptor) amplifies global understanding across different samples. Together, these adaptors synergistically enhance feature comparison within and across samples, bolstering the model's representational strength and its ability to adapt to new tasks. Empirical results demonstrate that MCA-SAM sets new benchmarks, outperforming existing methods in three challenging domains: camouflage object detection, shadow segmentation, and polyp segmentation. Specifically, MCA-SAM exhibits substantial relative performance enhancements, achieving a 20.0% improvement in MAE on the COD10K dataset, a 6.0% improvement in MAE on the CAMO dataset, a 15.4% improvement in BER on the ISTD dataset, and a 7.9% improvement in mDice on the Kvasir-SEG dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes