CVAug 8, 2022Code
Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern Hopfield NetworksYonghao Xu, Weikang Yu, Pedram Ghamisi et al.
The synthesis of high-resolution remote sensing images based on text descriptions has great potential in many practical application scenarios. Although deep neural networks have achieved great success in many important remote sensing tasks, generating realistic remote sensing images from text descriptions is still very difficult. To address this challenge, we propose a novel text-to-image modern Hopfield network (Txt2Img-MHN). The main idea of Txt2Img-MHN is to conduct hierarchical prototype learning on both text and image embeddings with modern Hopfield layers. Instead of directly learning concrete but highly diverse text-image joint feature representations for different semantics, Txt2Img-MHN aims to learn the most representative prototypes from text-image embeddings, achieving a coarse-to-fine learning strategy. These learned prototypes can then be utilized to represent more complex semantics in the text-to-image generation task. To better evaluate the realism and semantic consistency of the generated images, we further conduct zero-shot classification on real remote sensing data using the classification model trained on synthesized images. Despite its simplicity, we find that the overall accuracy in the zero-shot classification may serve as a good metric to evaluate the ability to generate an image from text. Extensive experiments on the benchmark remote sensing text-image dataset demonstrate that the proposed Txt2Img-MHN can generate more realistic remote sensing images than existing methods. Code and pre-trained models are available online (https://github.com/YonghaoXu/Txt2Img-MHN).
CVNov 15, 2022Code
Backdoor Attacks for Remote Sensing Data with Wavelet TransformNikolaus Dräger, Yonghao Xu, Pedram Ghamisi
Recent years have witnessed the great success of deep learning algorithms in the geoscience and remote sensing realm. Nevertheless, the security and robustness of deep learning models deserve special attention when addressing safety-critical remote sensing tasks. In this paper, we provide a systematic analysis of backdoor attacks for remote sensing data, where both scene classification and semantic segmentation tasks are considered. While most of the existing backdoor attack algorithms rely on visible triggers like squared patches with well-designed patterns, we propose a novel wavelet transform-based attack (WABA) method, which can achieve invisible attacks by injecting the trigger image into the poisoned image in the low-frequency domain. In this way, the high-frequency information in the trigger image can be filtered out in the attack, resulting in stealthy data poisoning. Despite its simplicity, the proposed method can significantly cheat the current state-of-the-art deep learning models with a high attack success rate. We further analyze how different trigger images and the hyper-parameters in the wavelet transform would influence the performance of the proposed method. Extensive experiments on four benchmark remote sensing datasets demonstrate the effectiveness of the proposed method for both scene classification and semantic segmentation tasks and thus highlight the importance of designing advanced backdoor defense algorithms to address this threat in remote sensing scenarios. The code will be available online at \url{https://github.com/ndraeger/waba}.
CVJul 31, 2023Code
Universal Adversarial Defense in Remote Sensing Based on Pre-trained Denoising Diffusion ModelsWeikang Yu, Yonghao Xu, Pedram Ghamisi
Deep neural networks (DNNs) have risen to prominence as key solutions in numerous AI applications for earth observation (AI4EO). However, their susceptibility to adversarial examples poses a critical challenge, compromising the reliability of AI4EO algorithms. This paper presents a novel Universal Adversarial Defense approach in Remote Sensing Imagery (UAD-RS), leveraging pre-trained diffusion models to protect DNNs against universal adversarial examples exhibiting heterogeneous patterns. Specifically, a universal adversarial purification framework is developed utilizing pre-trained diffusion models to mitigate adversarial perturbations through the introduction of Gaussian noise and subsequent purification of the perturbations from adversarial examples. Additionally, an Adaptive Noise Level Selection (ANLS) mechanism is introduced to determine the optimal noise level for the purification framework with a task-guided Frechet Inception Distance (FID) ranking strategy, thereby enhancing purification performance. Consequently, only a single pre-trained diffusion model is required for purifying universal adversarial samples with heterogeneous patterns across each dataset, significantly reducing training efforts for multiple attack settings while maintaining high performance without prior knowledge of adversarial perturbations. Experimental results on four heterogeneous RS datasets, focusing on scene classification and semantic segmentation, demonstrate that UAD-RS outperforms state-of-the-art adversarial purification approaches, providing universal defense against seven commonly encountered adversarial perturbations. Codes and the pre-trained models are available online (https://github.com/EricYu97/UAD-RS).
LGJun 7, 2023Code
Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper CalibrationEmanuel Sanchez Aimar, Nathaniel Helgesen, Yonghao Xu et al.
Long-tailed semi-supervised learning (LTSSL) represents a practical scenario for semi-supervised applications, challenged by skewed labeled distributions that bias classifiers. This problem is often aggravated by discrepancies between labeled and unlabeled class distributions, leading to biased pseudo-labels, neglect of rare classes, and poorly calibrated probabilities. To address these issues, we introduce Flexible Distribution Alignment (FlexDA), a novel adaptive logit-adjusted loss framework designed to dynamically estimate and align predictions with the actual distribution of unlabeled data and achieve a balanced classifier by the end of training. FlexDA is further enhanced by a distillation-based consistency loss, promoting fair data usage across classes and effectively leveraging underconfident samples. This method, encapsulated in ADELLO (Align and Distill Everything All at Once), proves robust against label shift, significantly improves model calibration in LTSSL contexts, and surpasses previous state-of-of-art approaches across multiple benchmarks, including CIFAR100-LT, STL10-LT, and ImageNet127, addressing class imbalance challenges in semi-supervised learning. Our code is available at https://github.com/emasa/ADELLO-LTSSL.
CVJan 22Code
Towards Realistic Remote Sensing Dataset Distillation with Discriminative Prototype-guided DiffusionYonghao Xu, Pedram Ghamisi, Qihao Weng
Recent years have witnessed the remarkable success of deep learning in remote sensing image interpretation, driven by the availability of large-scale benchmark datasets. However, this reliance on massive training data also brings two major challenges: (1) high storage and computational costs, and (2) the risk of data leakage, especially when sensitive categories are involved. To address these challenges, this study introduces the concept of dataset distillation into the field of remote sensing image interpretation for the first time. Specifically, we train a text-to-image diffusion model to condense a large-scale remote sensing dataset into a compact and representative distilled dataset. To improve the discriminative quality of the synthesized samples, we propose a classifier-driven guidance by injecting a classification consistency loss from a pre-trained model into the diffusion training process. Besides, considering the rich semantic complexity of remote sensing imagery, we further perform latent space clustering on training samples to select representative and diverse prototypes as visual style guidance, while using a visual language model to provide aggregated text descriptions. Experiments on three high-resolution remote sensing scene classification benchmarks show that the proposed method can distill realistic and diverse samples for downstream model training. Code and pre-trained models are available online (https://github.com/YonghaoXu/DPD).
CVSep 6, 2022
The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multi-Source Satellite ImageryOmid Ghorbanzadeh, Yonghao Xu, Hengwei Zhao et al.
The scientific outcomes of the 2022 Landslide4Sense (L4S) competition organized by the Institute of Advanced Research in Artificial Intelligence (IARAI) are presented here. The objective of the competition is to automatically detect landslides based on large-scale multiple sources of satellite imagery collected globally. The 2022 L4S aims to foster interdisciplinary research on recent developments in deep learning (DL) models for the semantic segmentation task using satellite imagery. In the past few years, DL-based models have achieved performance that meets expectations on image interpretation, due to the development of convolutional neural networks (CNNs). The main objective of this article is to present the details and the best-performing algorithms featured in this competition. The winning solutions are elaborated with state-of-the-art models like the Swin Transformer, SegFormer, and U-Net. Advanced machine learning techniques and strategies such as hard example mining, self-training, and mix-up data augmentation are also considered. Moreover, we describe the L4S benchmark data set in order to facilitate further comparisons, and report the results of the accuracy assessment online. The data is accessible on \textit{Future Development Leaderboard} for future evaluation at \url{https://www.iarai.ac.at/landslide4sense/challenge/}, and researchers are invited to submit more prediction results, evaluate the accuracy of their methods, compare them with those of other users, and, ideally, improve the landslide detection results reported in this article.
CVDec 19, 2022
AI Security for Geoscience and Remote Sensing: Challenges and Future TrendsYonghao Xu, Tao Bai, Weikang Yu et al.
Recent advances in artificial intelligence (AI) have significantly intensified research in the geoscience and remote sensing (RS) field. AI algorithms, especially deep learning-based ones, have been developed and applied widely to RS data analysis. The successful application of AI covers almost all aspects of Earth observation (EO) missions, from low-level vision tasks like super-resolution, denoising and inpainting, to high-level vision tasks like scene classification, object detection and semantic segmentation. While AI techniques enable researchers to observe and understand the Earth more accurately, the vulnerability and uncertainty of AI models deserve further attention, considering that many geoscience and RS tasks are highly safety-critical. This paper reviews the current development of AI security in the geoscience and RS field, covering the following five important aspects: adversarial attack, backdoor attack, federated learning, uncertainty and explainability. Moreover, the potential opportunities and trends are discussed to provide insights for future research. To the best of the authors' knowledge, this paper is the first attempt to provide a systematic review of AI security-related research in the geoscience and RS community. Available code and datasets are also listed in the paper to move this vibrant field of research forward.
CVJun 1, 2022
Landslide4Sense: Reference Benchmark Data and Deep Learning Models for Landslide DetectionOmid Ghorbanzadeh, Yonghao Xu, Pedram Ghamisi et al.
This study introduces \textit{Landslide4Sense}, a reference benchmark for landslide detection from remote sensing. The repository features 3,799 image patches fusing optical layers from Sentinel-2 sensors with the digital elevation model and slope layer derived from ALOS PALSAR. The added topographical information facilitates the accurate detection of landslide borders, which recent researches have shown to be challenging using optical data alone. The extensive data set supports deep learning (DL) studies in landslide detection and the development and validation of methods for the systematic update of landslide inventories. The benchmark data set has been collected at four different times and geographical locations: Iburi (September 2018), Kodagu (August 2018), Gorkha (April 2015), and Taiwan (August 2009). Each image pixel is labelled as belonging to a landslide or not, incorporating various sources and thorough manual annotation. We then evaluate the landslide detection performance of 11 state-of-the-art DL segmentation models: U-Net, ResU-Net, PSPNet, ContextNet, DeepLab-v2, DeepLab-v3+, FCN-8s, LinkNet, FRRN-A, FRRN-B, and SQNet. All models were trained from scratch on patches from one quarter of each study area and tested on independent patches from the other three quarters. Our experiments demonstrate that ResU-Net outperformed the other models for the landslide detection task. We make the multi-source landslide benchmark data (Landslide4Sense) and the tested DL models publicly available at \url{https://www.iarai.ac.at/landslide4sense}, establishing an important resource for remote sensing, computer vision, and machine learning communities in studies of image classification in general and applications to landslide detection in particular.
CVApr 10Code
GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote SensingAoran Xiao, Shihao Cheng, Yonghao Xu et al.
Recent advances in multimodal large language models (MLLMs) have accelerated progress in domain-oriented AI, yet their development in geoscience and remote sensing (RS) remains constrained by distinctive challenges: wide-ranging disciplinary knowledge, heterogeneous sensor modalities, and a fragmented spectrum of tasks. To bridge these gaps, we introduce GeoMMBench, a comprehensive multimodal question-answering benchmark covering diverse RS disciplines, sensors, and tasks, enabling broader and more rigorous evaluation than prior benchmarks. Using GeoMMBench, we assess 36 open-source and proprietary large language models, uncovering systematic deficiencies in domain knowledge, perceptual grounding, and reasoning--capabilities essential for expert-level geospatial interpretation. Beyond evaluation, we propose GeoMMAgent, a multi-agent framework that strategically integrates retrieval, perception, and reasoning through domain-specific RS models and tools. Extensive experimental results demonstrate that GeoMMAgent significantly outperforms standalone LLMs, underscoring the importance of tool-augmented agents for dynamically tackling complex geoscience and RS challenges.
CVOct 30, 2023
There Are No Data Like More Data- Datasets for Deep Learning in Earth ObservationMichael Schmitt, Seyed Ali Ahmadi, Yonghao Xu et al.
Carefully curated and annotated datasets are the foundation of machine learning, with particularly data-hungry deep neural networks forming the core of what is often called Artificial Intelligence (AI). Due to the massive success of deep learning applied to Earth Observation (EO) problems, the focus of the community has been largely on the development of ever-more sophisticated deep neural network architectures and training strategies largely ignoring the overall importance of datasets. For that purpose, numerous task-specific datasets have been created that were largely ignored by previously published review articles on AI for Earth observation. With this article, we want to change the perspective and put machine learning datasets dedicated to Earth observation data and applications into the spotlight. Based on a review of the historical developments, currently available resources are described and a perspective for future developments is formed. We hope to contribute to an understanding that the nature of our data is what distinguishes the Earth observation community from many other communities that apply deep learning techniques to image data, and that a detailed understanding of EO data peculiarities is among the core competencies of our discipline.
CVMay 10, 2024Code
Multi-Target Unsupervised Domain Adaptation for Semantic Segmentation without External DataYonghao Xu, Pedram Ghamisi, Yannis Avrithis
Multi-target unsupervised domain adaptation (UDA) aims to learn a unified model to address the domain shift between multiple target domains. Due to the difficulty of obtaining annotations for dense predictions, it has recently been introduced into cross-domain semantic segmentation. However, most existing solutions require labeled data from the source domain and unlabeled data from multiple target domains concurrently during training. Collectively, we refer to this data as "external". When faced with new unlabeled data from an unseen target domain, these solutions either do not generalize well or require retraining from scratch on all data. To address these challenges, we introduce a new strategy called "multi-target UDA without external data" for semantic segmentation. Specifically, the segmentation model is initially trained on the external data. Then, it is adapted to a new unseen target domain without accessing any external data. This approach is thus more scalable than existing solutions and remains applicable when external data is inaccessible. We demonstrate this strategy using a simple method that incorporates self-distillation and adversarial learning, where knowledge acquired from the external data is preserved during adaptation through "one-way" adversarial learning. Extensive experiments in several synthetic-to-real and real-to-real adaptation settings on four benchmark urban driving datasets show that our method significantly outperforms current state-of-the-art solutions, even in the absence of external data. Our source code is available online (https://github.com/YonghaoXu/UT-KD).
CVMar 11, 2025Code
Comparing Next-Day Wildfire Predictability of MODIS and VIIRS Satellite DataJustus Karlsson, Yonghao Xu, Amanda Berg et al.
Multiple studies have performed next-day fire prediction using satellite imagery. Two main satellites are used to detect wildfires: MODIS and VIIRS. Both satellites provide fire mask products, called MOD14 and VNP14, respectively. Studies have used one or the other, but there has been no comparison between them to determine which might be more suitable for next-day fire prediction. In this paper, we first evaluate how well VIIRS and MODIS data can be used to forecast wildfire spread one day ahead. We find that the model using VIIRS as input and VNP14 as target achieves the best results. Interestingly, the model using MODIS as input and VNP14 as target performs significantly better than using VNP14 as input and MOD14 as target. Next, we discuss why MOD14 might be harder to use for predicting next-day fires. We find that the MOD14 fire mask is highly stochastic and does not correlate with reasonable fire spread patterns. This is detrimental for machine learning tasks, as the model learns irrational patterns. Therefore, we conclude that MOD14 is unsuitable for next-day fire prediction and that VNP14 is a much better option. However, using MODIS input and VNP14 as target, we achieve a significant improvement in predictability. This indicates that an improved fire detection model is possible for MODIS. The full code and dataset is available online: https://github.com/justuskarlsson/wildfire-mod14-vnp14
CVFeb 14, 2022Code
Universal Adversarial Examples in Remote Sensing: Methodology and BenchmarkYonghao Xu, Pedram Ghamisi
Deep neural networks have achieved great success in many important remote sensing tasks. Nevertheless, their vulnerability to adversarial examples should not be neglected. In this study, we systematically analyze the universal adversarial examples in remote sensing data for the first time, without any knowledge from the victim model. Specifically, we propose a novel black-box adversarial attack method, namely Mixup-Attack, and its simple variant Mixcut-Attack, for remote sensing data. The key idea of the proposed methods is to find common vulnerabilities among different networks by attacking the features in the shallow layer of a given surrogate model. Despite their simplicity, the proposed methods can generate transferable adversarial examples that deceive most of the state-of-the-art deep neural networks in both scene classification and semantic segmentation tasks with high success rates. We further provide the generated universal adversarial examples in the dataset named UAE-RS, which is the first dataset that provides black-box adversarial samples in the remote sensing field. We hope UAE-RS may serve as a benchmark that helps researchers to design deep neural networks with strong resistance toward adversarial attacks in the remote sensing field. Codes and the UAE-RS dataset are available online (https://github.com/YonghaoXu/UAE-RS).
CVFeb 8, 2022Code
Consistency-Regularized Region-Growing Network for Semantic Segmentation of Urban Scenes with Point-Level AnnotationsYonghao Xu, Pedram Ghamisi
Deep learning algorithms have obtained great success in semantic segmentation of very high-resolution (VHR) images. Nevertheless, training these models generally requires a large amount of accurate pixel-wise annotations, which is very laborious and time-consuming to collect. To reduce the annotation burden, this paper proposes a consistency-regularized region-growing network (CRGNet) to achieve semantic segmentation of VHR images with point-level annotations. The key idea of CRGNet is to iteratively select unlabeled pixels with high confidence to expand the annotated area from the original sparse points. However, since there may exist some errors and noises in the expanded annotations, directly learning from them may mislead the training of the network. To this end, we further propose the consistency regularization strategy, where a base classifier and an expanded classifier are employed. Specifically, the base classifier is supervised by the original sparse annotations, while the expanded classifier aims to learn from the expanded annotations generated by the base classifier with the region-growing mechanism. The consistency regularization is thereby achieved by minimizing the discrepancy between the predictions from both the base and the expanded classifiers. We find such a simple regularization strategy is yet very useful to control the quality of the region-growing mechanism. Extensive experiments on two benchmark datasets demonstrate that the proposed CRGNet significantly outperforms the existing state-of-the-art methods. Codes and pre-trained models are available online (https://github.com/YonghaoXu/CRGNet).
CVDec 15, 2021Code
Self-Ensembling GAN for Cross-Domain Semantic SegmentationYonghao Xu, Fengxiang He, Bo Du et al.
Deep neural networks (DNNs) have greatly contributed to the performance gains in semantic segmentation. Nevertheless, training DNNs generally requires large amounts of pixel-level labeled data, which is expensive and time-consuming to collect in practice. To mitigate the annotation burden, this paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation. In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN. Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model, the latter of which is a common barrier shared by most adversarial training-based methods. We theoretically analyze SE-GAN and provide an $\mathcal O(1/\sqrt{N})$ generalization bound ($N$ is the training sample size), which suggests controlling the discriminator's hypothesis complexity to enhance the generalizability. Accordingly, we choose a simple network as the discriminator. Extensive and systematic experiments in two standard settings demonstrate that the proposed method significantly outperforms current state-of-the-art approaches. The source code of our model is available online (https://github.com/YonghaoXu/SE-GAN).
CVApr 8, 2021Code
Robust Self-Ensembling Network for Hyperspectral Image ClassificationYonghao Xu, Bo Du, Liangpei Zhang
Recent research has shown the great potential of deep learning algorithms in the hyperspectral image (HSI) classification task. Nevertheless, training these models usually requires a large amount of labeled data. Since the collection of pixel-level annotations for HSI is laborious and time-consuming, developing algorithms that can yield good performance in the small sample size situation is of great significance. In this study, we propose a robust self-ensembling network (RSEN) to address this problem. The proposed RSEN consists of two subnetworks including a base network and an ensemble network. With the constraint of both the supervised loss from the labeled data and the unsupervised loss from the unlabeled data, the base network and the ensemble network can learn from each other, achieving the self-ensembling mechanism. To the best of our knowledge, the proposed method is the first attempt to introduce the self-ensembling technique into the HSI classification task, which provides a different view on how to utilize the unlabeled data in HSI to assist the network training. We further propose a novel consistency filter to increase the robustness of self-ensembling learning. Extensive experiments on three benchmark HSI datasets demonstrate that the proposed algorithm can yield competitive performance compared with the state-of-the-art methods. Code is available online (\url{https://github.com/YonghaoXu/RSEN}).
CVDec 12, 2025
VLM2GeoVec: Toward Universal Multimodal Embeddings for Remote SensingEmanuel Sánchez Aimar, Gulnaz Zhambulova, Fahad Shahbaz Khan et al.
Satellite imagery differs fundamentally from natural images: its aerial viewpoint, very high resolution, diverse scale variations, and abundance of small objects demand both region-level spatial reasoning and holistic scene understanding. Current remote-sensing approaches remain fragmented between dual-encoder retrieval models, which excel at large-scale cross-modal search but cannot interleave modalities, and generative assistants, which support region-level interpretation but lack scalable retrieval capabilities. We propose $\textbf{VLM2GeoVec}$, an instruction-following, single-encoder vision-language model trained contrastively to embed interleaved inputs (images, text, bounding boxes, and geographic coordinates) in a unified vector space. Our single encoder interleaves all inputs into one joint embedding trained with a contrastive loss, eliminating multi-stage pipelines and task-specific modules. To evaluate its versatility, we introduce $\textbf{RSMEB}$, a novel benchmark covering key remote-sensing embedding applications: scene classification; cross-modal search; compositional retrieval; visual-question answering; visual grounding and region-level reasoning; and semantic geospatial retrieval. On RSMEB, it achieves $\textbf{26.6%}$ P@1 on region-caption retrieval (+25 pp vs. dual-encoder baselines), $\textbf{32.5%}$ P@1 on referring-expression retrieval (+19 pp), and $\textbf{17.8%}$ P@1 on semantic geo-localization retrieval (over $3\times$ prior best), while matching or exceeding specialized baselines on conventional tasks such as scene classification and cross-modal retrieval. VLM2GeoVec unifies scalable retrieval with region-level spatial reasoning, enabling cohesive multimodal analysis in remote sensing. We will publicly release the code, checkpoints, and data upon acceptance.
CVJan 26
DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster AssessmentSara Tehrani, Yonghao Xu, Leif Haglund et al.
Timely interpretation of satellite imagery is critical for disaster response, yet existing vision-language benchmarks for remote sensing largely focus on coarse labels and image-level recognition, overlooking the functional understanding and instruction robustness required in real humanitarian workflows. We introduce DisasterInsight, a multimodal benchmark designed to evaluate vision-language models (VLMs) on realistic disaster analysis tasks. DisasterInsight restructures the xBD dataset into approximately 112K building-centered instances and supports instruction-diverse evaluation across multiple tasks, including building-function classification, damage-level and disaster-type classification, counting, and structured report generation aligned with humanitarian assessment guidelines. To establish domain-adapted baselines, we propose DI-Chat, obtained by fine-tuning existing VLM backbones on disaster-specific instruction data using parameter-efficient Low-Rank Adaptation (LoRA). Extensive experiments on state-of-the-art generic and remote-sensing VLMs reveal substantial performance gaps across tasks, particularly in damage understanding and structured report generation. DI-Chat achieves significant improvements on damage-level and disaster-type classification as well as report generation quality, while building-function classification remains challenging for all evaluated models. DisasterInsight provides a unified benchmark for studying grounded multimodal reasoning in disaster imagery.
CVMar 26, 2024
Sen2Fire: A Challenging Benchmark Dataset for Wildfire Detection using Sentinel DataYonghao Xu, Amanda Berg, Leif Haglund
Utilizing satellite imagery for wildfire detection presents substantial potential for practical applications. To advance the development of machine learning algorithms in this domain, our study introduces the \textit{Sen2Fire} dataset--a challenging satellite remote sensing dataset tailored for wildfire detection. This dataset is curated from Sentinel-2 multi-spectral data and Sentinel-5P aerosol product, comprising a total of 2466 image patches. Each patch has a size of 512$\times$512 pixels with 13 bands. Given the distinctive sensitivities of various wavebands to wildfire responses, our research focuses on optimizing wildfire detection by evaluating different wavebands and employing a combination of spectral indices, such as normalized burn ratio (NBR) and normalized difference vegetation index (NDVI). The results suggest that, in contrast to using all bands for wildfire detection, selecting specific band combinations yields superior performance. Additionally, our study underscores the positive impact of integrating Sentinel-5 aerosol data for wildfire detection. The code and dataset are available online (https://zenodo.org/records/10881058).
CVJan 20, 2025
On the Adversarial Vulnerabilities of Transfer Learning in Remote SensingTao Bai, Xingjian Tian, Yonghao Xu et al.
The use of pretrained models from general computer vision tasks is widespread in remote sensing, significantly reducing training costs and improving performance. However, this practice also introduces vulnerabilities to downstream tasks, where publicly available pretrained models can be used as a proxy to compromise downstream models. This paper presents a novel Adversarial Neuron Manipulation method, which generates transferable perturbations by selectively manipulating single or multiple neurons in pretrained models. Unlike existing attacks, this method eliminates the need for domain-specific information, making it more broadly applicable and efficient. By targeting multiple fragile neurons, the perturbations achieve superior attack performance, revealing critical vulnerabilities in deep learning models. Experiments on diverse models and remote sensing datasets validate the effectiveness of the proposed method. This low-access adversarial neuron manipulation technique highlights a significant security risk in transfer learning models, emphasizing the urgent need for more robust defenses in their design when addressing the safety-critical remote sensing tasks.
CVSep 18, 2021
Unsupervised Domain Adaptation for Semantic Segmentation via Low-level Edge Information TransferHongruixuan Chen, Chen Wu, Yonghao Xu et al.
Unsupervised domain adaptation for semantic segmentation aims to make models trained on synthetic data (source domain) adapt to real images (target domain). Previous feature-level adversarial learning methods only consider adapting models on the high-level semantic features. However, the large domain gap between source and target domains in the high-level semantic features makes accurate adaptation difficult. In this paper, we present the first attempt at explicitly using low-level edge information, which has a small inter-domain gap, to guide the transfer of semantic information. To this end, a semantic-edge domain adaptation architecture is proposed, which uses an independent edge stream to process edge information, thereby generating high-quality semantic boundaries over the target domain. Then, an edge consistency loss is presented to align target semantic predictions with produced semantic boundaries. Moreover, we further propose two entropy reweighting methods for semantic adversarial learning and self-supervised learning, respectively, which can further enhance the adaptation performance of our architecture. Comprehensive experiments on two UDA benchmark datasets demonstrate the superiority of our architecture compared with state-of-the-art methods.
CVApr 5, 2019
Fast Spatio-Temporal Residual Network for Video Super-ResolutionSheng Li, Fengxiang He, Bo Du et al.
Recently, deep learning based video super-resolution (SR) methods have achieved promising performance. To simultaneously exploit the spatial and temporal information of videos, employing 3-dimensional (3D) convolutions is a natural approach. However, straight utilizing 3D convolutions may lead to an excessively high computational complexity which restricts the depth of video SR models and thus undermine the performance. In this paper, we present a novel fast spatio-temporal residual network (FSTRN) to adopt 3D convolutions for the video SR task in order to enhance the performance while maintaining a low computational load. Specifically, we propose a fast spatio-temporal residual block (FRB) that divide each 3D filter to the product of two 3D filters, which have considerably lower dimensions. Furthermore, we design a cross-space residual learning that directly links the low-resolution space and the high-resolution space, which can greatly relieve the computational burden on the feature fusion and up-scaling parts. Extensive evaluations and comparisons on benchmark datasets validate the strengths of the proposed approach and demonstrate that the proposed network significantly outperforms the current state-of-the-art methods.