SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks
This work addresses the problem of improving segmentation accuracy for researchers and practitioners in computer vision, though it is incremental as it builds upon existing SAM2-UNet principles.
The paper tackles the challenge of enhancing the Segment Anything Model (SAM) for downstream segmentation tasks by proposing SAM2-UNeXT, which integrates an auxiliary DINOv2 encoder and a dual-resolution strategy, resulting in superior performance across four benchmarks including dichotomous image segmentation and camouflaged object detection.
Recent studies have highlighted the potential of adapting the Segment Anything Model (SAM) for various downstream tasks. However, constructing a more powerful and generalizable encoder to further enhance performance remains an open challenge. In this work, we propose SAM2-UNeXT, an advanced framework that builds upon the core principles of SAM2-UNet while extending the representational capacity of SAM2 through the integration of an auxiliary DINOv2 encoder. By incorporating a dual-resolution strategy and a dense glue layer, our approach enables more accurate segmentation with a simple architecture, relaxing the need for complex decoder designs. Extensive experiments conducted on four benchmarks, including dichotomous image segmentation, camouflaged object detection, marine animal segmentation, and remote sensing saliency detection, demonstrate the superior performance of our proposed method. The code is available at https://github.com/WZH0120/SAM2-UNeXT.