Xiayang Xiao

CV
h-index6
4papers
17citations
Novelty39%
AI Score28

4 Papers

CLFeb 12, 2025Code
SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation

Zhiming Ma, Xiayang Xiao, Sihao Dong et al.

As a powerful all-weather Earth observation tool, synthetic aperture radar (SAR) remote sensing enables critical military reconnaissance, maritime surveillance, and infrastructure monitoring. Although Vision language models (VLMs) have made remarkable progress in natural language processing and image understanding, their applications remain limited in professional domains due to insufficient domain expertise. This paper innovatively proposes the first large-scale multimodal dialogue dataset for SAR images, named SARChat-2M, which contains approximately 2 million high-quality image-text pairs, encompasses diverse scenarios with detailed target annotations. This dataset not only supports several key tasks such as visual understanding and object detection tasks, but also has unique innovative aspects: this study develop a visual-language dataset and benchmark for the SAR domain, enabling and evaluating VLMs' capabilities in SAR image interpretation, which provides a paradigmatic framework for constructing multimodal datasets across various remote sensing vertical domains. Through experiments on 16 mainstream VLMs, the effectiveness of the dataset has been fully verified. The project will be released at https://github.com/JimmyMa99/SARChat.

CVNov 3, 2024
OSAD: Open-Set Aircraft Detection in SAR Images

Xiayang Xiao, Zhuoxuan Li, Haipeng Wang

Current mainstream SAR image object detection methods still lack robustness when dealing with unknown objects in open environments. Open-set detection aims to enable detectors trained on a closed set to detect all known objects and identify unknown objects in open-set environments. The key challenges are how to improve the generalization to potential unknown objects and reduce the empirical classification risk of known categories under strong supervision. To address these challenges, a novel open-set aircraft detector for SAR images is proposed, named Open-Set Aircraft Detection (OSAD), which is equipped with three dedicated components: global context modeling (GCM), location quality-driven pseudo labeling generation (LPG), and prototype contrastive learning (PCL). GCM effectively enhances the network's representation of objects by attention maps which is formed through the capture of long sequential positional relationships. LPG leverages clues about object positions and shapes to optimize localization quality, avoiding overfitting to known category information and enhancing generalization to potential unknown objects. PCL employs prototype-based contrastive encoding loss to promote instance-level intra-class compactness and inter-class variance, aiming to minimize the overlap between known and unknown distributions and reduce the empirical classification risk of known categories. Extensive experiments have demonstrated that the proposed method can effectively detect unknown objects and exhibit competitive performance without compromising closed-set performance. The highest absolute gain which ranges from 0 to 18.36% can be achieved on the average precision of unknown objects.

CVApr 7, 2024
Msmsfnet: a multi-stream and multi-scale fusion net for edge detection

Chenguang Liu, Chisheng Wang, Feifei Dong et al.

Edge detection is a long-standing problem in computer vision. Despite the efficiency of existing algorithms, their performance, however, rely heavily on the pre-trained weights of the backbone network on the ImageNet dataset. The use of pre-trained weights in previous methods significantly increases the difficulty to design new models for edge detection without relying on existing well-trained ImageNet models, as pre-training the model on the ImageNet dataset is expensive and becomes compulsory to ensure the fairness of comparison. Besides, the pre-training and fine-tuning strategy is not always useful and sometimes even inaccessible. For instance, the pre-trained weights on the ImageNet dataset are unlikely to be helpful for edge detection in Synthetic Aperture Radar (SAR) images due to strong differences in the statistics between optical images and SAR images. Moreover, no dataset has comparable size to the ImageNet dataset for SAR image processing. In this work, we study the performance achievable by state-of-the-art deep learning based edge detectors in publicly available datasets when they are trained from scratch, and devise a new network architecture, the multi-stream and multi-scale fusion net (msmsfnet), for edge detection. We show in our experiments that by training all models from scratch, our model outperforms state-of-the-art edge detectors in three publicly available datasets. We also demonstrate the efficiency of our model for edge detection in SAR images, where no useful pre-trained weight is available. Finally, We show that our model is able to achieve competitive performance on the BSDS500 dataset when the pre-trained weights are used.

CVNov 7, 2024
Electromagnetic Scattering Kernel Guided Reciprocal Point Learning for SAR Open-Set Recognition

Xiayang Xiao, Zhuoxuan Li, Ruyi Zhang et al.

The limitations of existing Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) methods lie in their confinement by the closed-environment assumption, hindering their effective and robust handling of unknown target categories in open environments. Open Set Recognition (OSR), a pivotal facet for algorithmic practicality, intends to categorize known classes while denoting unknown ones as "unknown." The chief challenge in OSR involves concurrently mitigating risks associated with generalizing features from a restricted set of known classes to numerous unknown samples and the open space exposure to potential unknown data. To enhance open-set SAR classification, a method called scattering kernel with reciprocal learning network is proposed. Initially, a feature learning framework is constructed based on reciprocal point learning (RPL), establishing a bounded space for potential unknown classes. This approach indirectly introduces unknown information into a learner confined to known classes, thereby acquiring more concise and discriminative representations. Subsequently, considering the variability in the imaging of targets at different angles and the discreteness of components in SAR images, a proposal is made to design convolutional kernels based on large-sized attribute scattering center models. This enhances the ability to extract intrinsic non-linear features and specific scattering characteristics in SAR images, thereby improving the discriminative features of the model and mitigating the impact of imaging variations on classification performance. Experiments on the MSTAR datasets substantiate the superior performance of the proposed approach called ASC-RPL over mainstream methods.