CV AIOct 26, 2025

SARCLIP: A Vision Language Foundation Model for Semantic Understanding and Target Recognition in SAR Imagery

Qiwei Ma, Zhiyu Wang, Wang Liu, Xukun Lu, Bin Deng, Puhong Duan, Xudong Kang, Shutao Li

arXiv:2510.22665v110.2h-index: 47

Originality Incremental advance

AI Analysis

This work addresses the need for better semantic understanding and target recognition in SAR imagery, which is crucial for applications like all-weather surveillance, but it is incremental as it adapts existing vision-language methods to a specific domain.

The paper tackles the problem of limited multimodal alignment and zero-shot target recognition in Synthetic Aperture Radar (SAR) imagery by introducing SARCLIP, a vision-language foundation model trained on a new large-scale dataset, which significantly outperforms state-of-the-art models in tasks like image-text retrieval and zero-shot classification.

Synthetic Aperture Radar (SAR) has emerged as a crucial imaging modality due to its all-weather capabilities. While recent advancements in self-supervised learning and Masked Image Modeling (MIM) have paved the way for SAR foundation models, these approaches primarily focus on low-level visual features, often overlooking multimodal alignment and zero-shot target recognition within SAR imagery. To address this limitation, we construct SARCLIP-1M, a large-scale vision language dataset comprising over one million text-image pairs aggregated from existing datasets. We further introduce SARCLIP, the first vision language foundation model tailored for the SAR domain. Our SARCLIP model is trained using a contrastive vision language learning approach by domain transferring strategy, enabling it to bridge the gap between SAR imagery and textual descriptions. Extensive experiments on image-text retrieval and zero-shot classification tasks demonstrate the superior performance of SARCLIP in feature extraction and interpretation, significantly outperforming state-of-the-art foundation models and advancing the semantic understanding of SAR imagery. The code and datasets will be released soon.

View on arXiv PDF

Similar