CVDec 19, 2022
AI Security for Geoscience and Remote Sensing: Challenges and Future TrendsYonghao Xu, Tao Bai, Weikang Yu et al.
Recent advances in artificial intelligence (AI) have significantly intensified research in the geoscience and remote sensing (RS) field. AI algorithms, especially deep learning-based ones, have been developed and applied widely to RS data analysis. The successful application of AI covers almost all aspects of Earth observation (EO) missions, from low-level vision tasks like super-resolution, denoising and inpainting, to high-level vision tasks like scene classification, object detection and semantic segmentation. While AI techniques enable researchers to observe and understand the Earth more accurately, the vulnerability and uncertainty of AI models deserve further attention, considering that many geoscience and RS tasks are highly safety-critical. This paper reviews the current development of AI security in the geoscience and RS field, covering the following five important aspects: adversarial attack, backdoor attack, federated learning, uncertainty and explainability. Moreover, the potential opportunities and trends are discussed to provide insights for future research. To the best of the authors' knowledge, this paper is the first attempt to provide a systematic review of AI security-related research in the geoscience and RS community. Available code and datasets are also listed in the paper to move this vibrant field of research forward.
AIDec 17, 2025Code
CangLing-KnowFlow: A Unified Knowledge-and-Flow-fused Agent for Comprehensive Remote Sensing ApplicationsZhengchao Chen, Haoran Wang, Jing Yao et al.
The automated and intelligent processing of massive remote sensing (RS) datasets is critical in Earth observation (EO). Existing automated systems are normally task-specific, lacking a unified framework to manage diverse, end-to-end workflows--from data preprocessing to advanced interpretation--across diverse RS applications. To address this gap, this paper introduces CangLing-KnowFlow, a unified intelligent agent framework that integrates a Procedural Knowledge Base (PKB), Dynamic Workflow Adjustment, and an Evolutionary Memory Module. The PKB, comprising 1,008 expert-validated workflow cases across 162 practical RS tasks, guides planning and substantially reduces hallucinations common in general-purpose agents. During runtime failures, the Dynamic Workflow Adjustment autonomously diagnoses and replans recovery strategies, while the Evolutionary Memory Module continuously learns from these events, iteratively enhancing the agent's knowledge and performance. This synergy enables CangLing-KnowFlow to adapt, learn, and operate reliably across diverse, complex tasks. We evaluated CangLing-KnowFlow on the KnowFlow-Bench, a novel benchmark of 324 workflows inspired by real-world applications, testing its performance across 13 top Large Language Model (LLM) backbones, from open-source to commercial. Across all complex tasks, CangLing-KnowFlow surpassed the Reflexion baseline by at least 4% in Task Success Rate. As the first most comprehensive validation along this emerging field, this research demonstrates the great potential of CangLing-KnowFlow as a robust, efficient, and scalable automated solution for complex EO challenges by leveraging expert knowledge (Knowledge) into adaptive and verifiable procedures (Flow).
CVSep 18, 2021Code
UNetFormer: A UNet-like Transformer for Efficient Semantic Segmentation of Remote Sensing Urban Scene ImageryLibo Wang, Rui Li, Ce Zhang et al.
Semantic segmentation of remotely sensed urban scene images is required in a wide range of practical applications, such as land cover mapping, urban change detection, environmental protection, and economic assessment.Driven by rapid developments in deep learning technologies, the convolutional neural network (CNN) has dominated semantic segmentation for many years. CNN adopts hierarchical feature representation, demonstrating strong capabilities for local information extraction. However, the local property of the convolution layer limits the network from capturing the global context. Recently, as a hot topic in the domain of computer vision, Transformer has demonstrated its great potential in global information modelling, boosting many vision-related tasks such as image classification, object detection, and particularly semantic segmentation. In this paper, we propose a Transformer-based decoder and construct a UNet-like Transformer (UNetFormer) for real-time urban scene segmentation. For efficient segmentation, the UNetFormer selects the lightweight ResNet18 as the encoder and develops an efficient global-local attention mechanism to model both global and local information in the decoder. Extensive experiments reveal that our method not only runs faster but also produces higher accuracy compared with state-of-the-art lightweight models. Specifically, the proposed UNetFormer achieved 67.8% and 52.4% mIoU on the UAVid and LoveDA datasets, respectively, while the inference speed can achieve up to 322.4 FPS with a 512x512 input on a single NVIDIA GTX 3090 GPU. In further exploration, the proposed Transformer-based decoder combined with a Swin Transformer encoder also achieves the state-of-the-art result (91.3% F1 and 84.1% mIoU) on the Vaihingen dataset. The source code will be freely available at https://github.com/WangLibo1995/GeoSeg.
IVJul 26, 2020Code
MACU-Net for Semantic Segmentation of Fine-Resolution Remotely Sensed ImagesRui Li, Chenxi Duan, Shunyi Zheng et al.
Semantic segmentation of remotely sensed images plays an important role in land resource management, yield estimation, and economic assessment. U-Net, a deep encoder-decoder architecture, has been used frequently for image segmentation with high accuracy. In this Letter, we incorporate multi-scale features generated by different layers of U-Net and design a multi-scale skip connected and asymmetric-convolution-based U-Net (MACU-Net), for segmentation using fine-resolution remotely sensed images. Our design has the following advantages: (1) The multi-scale skip connections combine and realign semantic features contained in both low-level and high-level feature maps; (2) the asymmetric convolution block strengthens the feature representation and feature extraction capability of a standard convolution layer. Experiments conducted on two remotely sensed datasets captured by different satellite sensors demonstrate that the proposed MACU-Net transcends the U-Net, U-NetPPL, U-Net 3+, amongst other benchmark approaches. Code is available at https://github.com/lironui/MACU-Net.
CLAug 1, 2025
Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training DataSohaib Imran, Rob Lamb, Peter M. Atkinson
Large language models (LLMs) are trained on large corpora, yet it is unclear whether they can reason about the information present within their training data. We design experiments to study out-of-context abduction in LLMs, the ability to infer the most plausible explanations for observations using relevant facts present in training data. We train treatment LLMs on names and behavior descriptions of fictitious chatbots, but not on examples of dialogue with the chatbots. We find that OpenAI's GPT 4o LLM can correctly infer at least one chatbot's name after observing example responses characteristic of that chatbot. We also find that previously training GPT 4o on descriptions of a chatbot's behavior allows it to display behaviors more characteristic of the chatbot when iteratively trained to display such behaviors. Our results have implications for situational awareness in LLMs and, therefore, for AI safety.
CLJul 23, 2025
Are LLM Belief Updates Consistent with Bayes' Theorem?Sohaib Imran, Ihor Kendiukhov, Matthew Broerman et al.
Do larger and more capable language models learn to update their "beliefs" about propositions more consistently with Bayes' theorem when presented with evidence in-context? To test this, we formulate a Bayesian Coherence Coefficient (BCC) metric and generate a dataset with which to measure the BCC. We measure BCC for multiple pre-trained-only language models across five model families, comparing against the number of model parameters, the amount of training data, and model scores on common benchmarks. Our results provide evidence for our hypothesis that larger and more capable pre-trained language models assign credences that are more coherent with Bayes' theorem. These results have important implications for our understanding and governance of LLMs.
CVOct 17, 2025
Standardization for improved Spatio-Temporal Image FusionHarkaitz Goyena, Peter M. Atkinson, Unai Pérez-Goya et al.
Spatio-Temporal Image Fusion (STIF) methods usually require sets of images with matching spatial and spectral resolutions captured by different sensors. To facilitate the application of STIF methods, we propose and compare two different standardization approaches. The first method is based on traditional upscaling of the fine-resolution images. The second method is a sharpening approach called Anomaly Based Satellite Image Standardization (ABSIS) that blends the overall features found in the fine-resolution image series with the distinctive attributes of a specific coarse-resolution image to produce images that more closely resemble the outcome of aggregating the fine-resolution images. Both methods produce a significant increase in accuracy of the Unpaired Spatio Temporal Fusion of Image Patches (USTFIP) STIF method, with the sharpening approach increasing the spectral and spatial accuracies of the fused images by up to 49.46\% and 78.40\%, respectively.
CVApr 28, 2025
Crowd Detection Using Very-Fine-Resolution Satellite ImageryTong Xiao, Qunming Wang, Ping Lu et al.
Accurate crowd detection (CD) is critical for public safety and historical pattern analysis, yet existing methods relying on ground and aerial imagery suffer from limited spatio-temporal coverage. The development of very-fine-resolution (VFR) satellite sensor imagery (e.g., ~0.3 m spatial resolution) provides unprecedented opportunities for large-scale crowd activity analysis, but it has never been considered for this task. To address this gap, we proposed CrowdSat-Net, a novel point-based convolutional neural network, which features two innovative components: Dual-Context Progressive Attention Network (DCPAN) to improve feature representation of individuals by aggregating scene context and local individual characteristics, and High-Frequency Guided Deformable Upsampler (HFGDU) that recovers high-frequency information during upsampling through frequency-domain guided deformable convolutions. To validate the effectiveness of CrowdSat-Net, we developed CrowdSat, the first VFR satellite imagery dataset designed specifically for CD tasks, comprising over 120k manually labeled individuals from multi-source satellite platforms (Beijing-3N, Jilin-1 Gaofen-04A and Google Earth) across China. In the experiments, CrowdSat-Net was compared with five state-of-the-art point-based CD methods (originally designed for ground or aerial imagery) using CrowdSat and achieved the largest F1-score of 66.12% and Precision of 73.23%, surpassing the second-best method by 1.71% and 2.42%, respectively. Moreover, extensive ablation experiments validated the importance of the DCPAN and HFGDU modules. Furthermore, cross-regional evaluation further demonstrated the spatial generalizability of CrowdSat-Net. This research advances CD capability by providing both a newly developed network architecture for CD and a pioneering benchmark dataset to facilitate future CD development.
CVMar 14, 2021
Scale-aware Neural Network for Semantic Segmentation of Multi-resolution Remote Sensing ImagesLibo Wang, Ce Zhang, Rui Li et al.
Assigning geospatial objects with specific categories at the pixel level is a fundamental task in remote sensing image analysis. Along with rapid development in sensor technologies, remotely sensed images can be captured at multiple spatial resolutions (MSR) with information content manifested at different scales. Extracting information from these MSR images represents huge opportunities for enhanced feature representation and characterisation. However, MSR images suffer from two critical issues: 1) increased scale variation of geo-objects and 2) loss of detailed information at coarse spatial resolutions. To bridge these gaps, in this paper, we propose a novel scale-aware neural network (SaNet) for semantic segmentation of MSR remotely sensed imagery. SaNet deploys a densely connected feature network (DCFFM) module to capture high-quality multi-scale context, such that the scale variation is handled properly and the quality of segmentation is increased for both large and small objects. A spatial feature recalibration (SFRM) module is further incorporated into the network to learn intact semantic content with enhanced spatial relationships, where the negative effects of information loss are removed. The combination of DCFFM and SFRM allows SaNet to learn scale-aware feature representation, which outperforms the existing multi-scale feature representation. Extensive experiments on three semantic segmentation datasets demonstrated the effectiveness of the proposed SaNet in cross-resolution segmentation.
CVNov 22, 2020
Robust Unsupervised Small Area Change Detection from SAR Imagery Using Deep LearningXinzheng Zhang, Hang Su, Ce Zhang et al.
Small area change detection from synthetic aperture radar (SAR) is a highly challenging task. In this paper, a robust unsupervised approach is proposed for small area change detection from multi-temporal SAR images using deep learning. First, a multi-scale superpixel reconstruction method is developed to generate a difference image (DI), which can suppress the speckle noise effectively and enhance edges by exploiting local, spatially homogeneous information. Second, a two-stage centre-constrained fuzzy c-means clustering algorithm is proposed to divide the pixels of the DI into changed, unchanged and intermediate classes with a parallel clustering strategy. Image patches belonging to the first two classes are then constructed as pseudo-label training samples, and image patches of the intermediate class are treated as testing samples. Finally, a convolutional wavelet neural network (CWNN) is designed and trained to classify testing samples into changed or unchanged classes, coupled with a deep convolutional generative adversarial network (DCGAN) to increase the number of changed class within the pseudo-label training samples. Numerical experiments on four real SAR datasets demonstrate the validity and robustness of the proposed approach, achieving up to 99.61% accuracy for small area change detection.
CVMar 3, 2020
A Robust Imbalanced SAR Image Change Detection Approach Based on Deep Difference Image and PCANetXinzheng Zhang, Hang Su, Ce Zhang et al.
In this research, a novel robust change detection approach is presented for imbalanced multi-temporal synthetic aperture radar (SAR) image based on deep learning. Our main contribution is to develop a novel method for generating difference image and a parallel fuzzy c-means (FCM) clustering method. The main steps of our proposed approach are as follows: 1) Inspired by convolution and pooling in deep learning, a deep difference image (DDI) is obtained based on parameterized pooling leading to better speckle suppression and feature enhancement than traditional difference images. 2) Two different parameter Sigmoid nonlinear mapping are applied to the DDI to get two mapped DDIs. Parallel FCM are utilized on these two mapped DDIs to obtain three types of pseudo-label pixels, namely, changed pixels, unchanged pixels, and intermediate pixels. 3) A PCANet with support vector machine (SVM) are trained to classify intermediate pixels to be changed or unchanged. Three imbalanced multi-temporal SAR image sets are used for change detection experiments. The experimental results demonstrate that the proposed approach is effective and robust for imbalanced SAR data, and achieve up to 99.52% change detection accuracy superior to most state-of-the-art methods.
LGDec 19, 2018
Multisource and Multitemporal Data Fusion in Remote SensingPedram Ghamisi, Behnood Rasti, Naoto Yokoya et al.
The sharp and recent increase in the availability of data captured by different sensors combined with their considerably heterogeneous natures poses a serious challenge for the effective and efficient processing of remotely sensed data. Such an increase in remote sensing and ancillary datasets, however, opens up the possibility of utilizing multimodal datasets in a joint manner to further improve the performance of the processing approaches with respect to the application at hand. Multisource data fusion has, therefore, received enormous attention from researchers worldwide for a wide variety of applications. Moreover, thanks to the revisit capability of several spaceborne sensors, the integration of the temporal information with the spatial and/or spectral/backscattering information of the remotely sensed data is possible and helps to move from a representation of 2D/3D data to 4D data structures, where the time variable adds new information as well as challenges for the information extraction algorithms. There are a huge number of research works dedicated to multisource and multitemporal data fusion, but the methods for the fusion of different modalities have expanded in different paths according to each research community. This paper brings together the advances of multisource and multitemporal data fusion approaches with respect to different research communities and provides a thorough and discipline-specific starting point for researchers at different levels (i.e., students, researchers, and senior researchers) willing to conduct novel investigations on this challenging topic by supplying sufficient detail and references.