CVApr 20, 2022
Attentive Dual Stream Siamese U-net for Flood Detection on Multi-temporal Sentinel-1 DataRitu Yadav, Andrea Nascetti, Yifang Ban
Due to climate and land-use change, natural disasters such as flooding have been increasing in recent years. Timely and reliable flood detection and mapping can help emergency response and disaster management. In this work, we propose a flood detection network using bi-temporal SAR acquisitions. The proposed segmentation network has an encoder-decoder architecture with two Siamese encoders for pre and post-flood images. The network's feature maps are fused and enhanced using attention blocks to achieve more accurate detection of the flooded areas. Our proposed network is evaluated on publicly available Sen1Flood11 benchmark dataset. The network outperformed the existing state-of-the-art (uni-temporal) flood detection method by 6\% IOU. The experiments highlight that the combination of bi-temporal SAR data with an effective network architecture achieves more accurate flood detection than uni-temporal methods.
CVApr 26, 2022
Urban Change Detection Using a Dual-Task Siamese Network and Semi-Supervised LearningSebastian Hafner, Yifang Ban, Andrea Nascetti
In this study, a Semi-Supervised Learning (SSL) method for improving urban change detection from bi-temporal image pairs was presented. The proposed method adapted a Dual-Task Siamese Difference network that not only predicts changes with the difference decoder, but also segments buildings for both images with a semantics decoder. First, the architecture was modified to produce a second change prediction derived from the semantics predictions. Second, SSL was adopted to improve supervised change detection. For unlabeled data, we introduced a loss that encourages the network to predict consistent changes across the two change outputs. The proposed method was tested on urban change detection using the SpaceNet7 dataset. SSL achieved improved results compared to three fully supervised benchmarks.
CVJul 3, 2023
A CNN regression model to estimate buildings height maps using Sentinel-1 SAR and Sentinel-2 MSI time seriesRitu Yadav, Andrea Nascetti, Yifang Ban
Accurate estimation of building heights is essential for urban planning, infrastructure management, and environmental analysis. In this study, we propose a supervised Multimodal Building Height Regression Network (MBHR-Net) for estimating building heights at 10m spatial resolution using Sentinel-1 (S1) and Sentinel-2 (S2) satellite time series. S1 provides Synthetic Aperture Radar (SAR) data that offers valuable information on building structures, while S2 provides multispectral data that is sensitive to different land cover types, vegetation phenology, and building shadows. Our MBHR-Net aims to extract meaningful features from the S1 and S2 images to learn complex spatio-temporal relationships between image patterns and building heights. The model is trained and tested in 10 cities in the Netherlands. Root Mean Squared Error (RMSE), Intersection over Union (IOU), and R-squared (R2) score metrics are used to evaluate the performance of the model. The preliminary results (3.73m RMSE, 0.95 IoU, 0.61 R2) demonstrate the effectiveness of our deep learning model in accurately estimating building heights, showcasing its potential for urban planning, environmental impact analysis, and other related applications.
CVApr 26, 2022
Building Change Detection using Multi-Temporal Airborne LiDAR DataRitu Yadav, Andrea Nascetti, Yifang Ban
Building change detection is essential for monitoring urbanization, disaster assessment, urban planning and frequently updating the maps. 3D structure information from airborne light detection and ranging (LiDAR) is very effective for detecting urban changes. But the 3D point cloud from airborne LiDAR(ALS) holds an enormous amount of unordered and irregularly sparse information. Handling such data is tricky and consumes large memory for processing. Most of this information is not necessary when we are looking for a particular type of urban change. In this study, we propose an automatic method that reduces the 3D point clouds into a much smaller representation without losing the necessary information required for detecting Building changes. The method utilizes the Deep Learning(DL) model U-Net for segmenting the buildings from the background. Produced segmentation maps are then processed further for detecting changes and the results are refined using morphological methods. For the change detection task, we used multi-temporal airborne LiDAR data. The data is acquired over Stockholm in the years 2017 and 2019. The changes in buildings are classified into four types: 'newly built', 'demolished', 'taller' and 'shorter'. The detected changes are visualized in one map for better interpretation.
CVDec 7, 2022
Unsupervised Flood Detection on SAR Time SeriesRitu Yadav, Andrea Nascetti, Hossein Azizpour et al.
Human civilization has an increasingly powerful influence on the earth system. Affected by climate change and land-use change, natural disasters such as flooding have been increasing in recent years. Earth observations are an invaluable source for assessing and mitigating negative impacts. Detecting changes from Earth observation data is one way to monitor the possible impact. Effective and reliable Change Detection (CD) methods can help in identifying the risk of disaster events at an early stage. In this work, we propose a novel unsupervised CD method on time series Synthetic Aperture Radar~(SAR) data. Our proposed method is a probabilistic model trained with unsupervised learning techniques, reconstruction, and contrastive learning. The change map is generated with the help of the distribution difference between pre-incident and post-incident data. Our proposed CD model is evaluated on flood detection data. We verified the efficacy of our model on 8 different flood sites, including three recent flood events from Copernicus Emergency Management Services and six from the Sen1Floods11 dataset. Our proposed model achieved an average of 64.53\% Intersection Over Union(IoU) value and 75.43\% F1 score. Our achieved IoU score is approximately 6-27\% and F1 score is approximately 7-22\% better than the compared unsupervised and supervised existing CD methods. The results and extensive discussion presented in the study show the effectiveness of the proposed unsupervised CD method.
CVJun 1, 2023
Multi-Modal Deep Learning for Multi-Temporal Urban Mapping With a Partly Missing Optical ModalitySebastian Hafner, Yifang Ban
This paper proposes a novel multi-temporal urban mapping approach using multi-modal satellite data from the Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 MultiSpectral Instrument (MSI) missions. In particular, it focuses on the problem of a partly missing optical modality due to clouds. The proposed model utilizes two networks to extract features from each modality separately. In addition, a reconstruction network is utilized to approximate the optical features based on the SAR data in case of a missing optical modality. Our experiments on a multi-temporal urban mapping dataset with Sentinel-1 SAR and Sentinel-2 MSI data demonstrate that the proposed method outperforms a multi-modal approach that uses zero values as a replacement for missing optical data, as well as a uni-modal SAR-based approach. Therefore, the proposed method is effective in exploiting multi-modal data, if available, but it also retains its effectiveness in case the optical modality is missing.
CVMar 15, 2023
Mapping Urban Population Growth from Sentinel-2 MSI and Census Data Using Deep Learning: A Case Study in Kigali, RwandaSebastian Hafner, Stefanos Georganos, Theodomir Mugiraneza et al.
To better understand current trends of urban population growth in Sub-Saharan Africa, high-quality spatiotemporal population estimates are necessary. While the joint use of remote sensing and deep learning has achieved promising results for population distribution estimation, most of the current work focuses on fine-scale spatial predictions derived from single date census, thereby neglecting temporal analyses. In this work, we focus on evaluating how deep learning change detection techniques can unravel temporal population dynamics at short intervals. Since Post-Classification Comparison (PCC) methods for change detection are known to propagate the error of the individual maps, we propose an end-to-end population growth mapping method. Specifically, a ResNet encoder, pretrained on a population mapping task with Sentinel-2 MSI data, was incorporated into a Siamese network. The Siamese network was trained at the census level to accurately predict population change. The effectiveness of the proposed method is demonstrated in Kigali, Rwanda, for the time period 2016-2020, using bi-temporal Sentinel-2 data. Compared to PCC, the Siamese network greatly reduced errors in population change predictions at the census level. These results show promise for future remote sensing-based population growth mapping endeavors.
CVJun 15, 2023
Context-Aware Change Detection With Semi-Supervised LearningRitu Yadav, Andrea Nascetti, Yifang Ban
Change detection using earth observation data plays a vital role in quantifying the impact of disasters in affected areas. While data sources like Sentinel-2 provide rich optical information, they are often hindered by cloud cover, limiting their usage in disaster scenarios. However, leveraging pre-disaster optical data can offer valuable contextual information about the area such as landcover type, vegetation cover, soil types, enabling a better understanding of the disaster's impact. In this study, we develop a model to assess the contribution of pre-disaster Sentinel-2 data in change detection tasks, focusing on disaster-affected areas. The proposed Context-Aware Change Detection Network (CACDN) utilizes a combination of pre-disaster Sentinel-2 data, pre and post-disaster Sentinel-1 data and ancillary Digital Elevation Models (DEM) data. The model is validated on flood and landslide detection and evaluated using three metrics: Area Under the Precision-Recall Curve (AUPRC), Intersection over Union (IoU), and mean IoU. The preliminary results show significant improvement (4\%, AUPRC, 3-7\% IoU, 3-6\% mean IoU) in model's change detection capabilities when incorporated with pre-disaster optical data reflecting the effectiveness of using contextual information for accurate flood and landslide detection.
CVApr 11, 2023
Investigating Imbalances Between SAR and Optical Utilization for Multi-Modal Urban MappingSebastian Hafner, Yifang Ban, Andrea Nascetti
Accurate urban maps provide essential information to support sustainable urban development. Recent urban mapping methods use multi-modal deep neural networks to fuse Synthetic Aperture Radar (SAR) and optical data. However, multi-modal networks may rely on just one modality due to the greedy nature of learning. In turn, the imbalanced utilization of modalities can negatively affect the generalization ability of a network. In this paper, we investigate the utilization of SAR and optical data for urban mapping. To that end, a dual-branch network architecture using intermediate fusion modules to share information between the uni-modal branches is utilized. A cut-off mechanism in the fusion modules enables the stopping of information flow between the branches, which is used to estimate the network's dependence on SAR and optical data. While our experiments on the SEN12 Global Urban Mapping dataset show that good performance can be achieved with conventional SAR-optical data fusion (F1 score = 0.682 $\pm$ 0.014), we also observed a clear under-utilization of optical data. Therefore, future work is required to investigate whether a more balanced utilization of SAR and optical data can lead to performance improvements.
CVMay 6Code
Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 DataAli Shibli, Andrea Nascetti, Yifang Ban
Wildfire burned-area mapping is essential for damage assessment, emissions modeling, and understanding fire-climate interactions across diverse ecological regions. Recent geospatial foundation models provide strong general-purpose representations for satellite imagery, yet there is still no clear understanding of how to efficiently adapt these models for downstream Earth observation tasks, particularly under geographic and temporal domain shift. This study evaluates three state-of-the-art Geospatial Foundation Models (GFMs) - Terramind, DINOv3, and Prithvi-v2 - for burned-area mapping across the United States and Canada using Sentinel-2 data. Leveraging 3,820 wildfire events from 2017-2023, we conduct spatial and temporal generalization tests across diverse biomes. We systematically compare full fine-tuning, decoder-only fine-tuning, and Low-Rank Adaptation (LoRA) for adapting each model. Across all experiments, LoRA provides the strongest cross-domain generalization while updating less than 1% of parameters, demonstrating a favorable trade-off between accuracy and efficiency. Prithvi-v2 with LoRA achieves the highest overall accuracy and the largest improvement compared to full fine-tuning. These findings indicate that geospatial foundation models, when adapted using lightweight parameter-efficient methods such as LoRA, offer a robust and scalable solution for large-scale burned-area mapping. Code is available at https://github.com/alishibli97/wildfire-lora-gfm.
CVDec 5, 2024Code
PANGAEA: A Global and Inclusive Benchmark for Geospatial Foundation ModelsValerio Marsocci, Yuru Jia, Georges Le Bellier et al.
Geospatial Foundation Models (GFMs) have emerged as powerful tools for extracting representations from Earth observation data, but their evaluation remains inconsistent and narrow. Existing works often evaluate on suboptimal downstream datasets and tasks, that are often too easy or too narrow, limiting the usefulness of the evaluations to assess the real-world applicability of GFMs. Additionally, there is a distinct lack of diversity in current evaluation protocols, which fail to account for the multiplicity of image resolutions, sensor types, and temporalities, which further complicates the assessment of GFM performance. In particular, most existing benchmarks are geographically biased towards North America and Europe, questioning the global applicability of GFMs. To overcome these challenges, we introduce PANGAEA, a standardized evaluation protocol that covers a diverse set of datasets, tasks, resolutions, sensor modalities, and temporalities. It establishes a robust and widely applicable benchmark for GFMs. We evaluate the most popular GFMs openly available on this benchmark and analyze their performance across several domains. In particular, we compare these models to supervised baselines (e.g. UNet and vanilla ViT), and assess their effectiveness when faced with limited labeled data. Our findings highlight the limitations of GFMs, under different scenarios, showing that they do not consistently outperform supervised models. PANGAEA is designed to be highly extensible, allowing for the seamless inclusion of new datasets, models, and tasks in future research. By releasing the evaluation code and benchmark, we aim to enable other researchers to replicate our experiments and build upon our work, fostering a more principled evaluation protocol for large pre-trained geospatial models. The code is available at https://github.com/VMarsocci/pangaea-bench.
CVMar 26
GeoHeight-Bench: Towards Height-Aware Multimodal Reasoning in Remote SensingXuran Hu, Zhitong Xiong, Zhongcheng Hong et al.
Current Large Multimodal Models (LMMs) in Earth Observation typically neglect the critical "vertical" dimension, limiting their reasoning capabilities in complex remote sensing geometries and disaster scenarios where physical spatial structures often outweigh planar visual textures. To bridge this gap, we introduce a comprehensive evaluation framework dedicated to height-aware remote sensing understanding. First, to overcome the severe scarcity of annotated data, we develop a scalable, VLM-driven data generation pipeline utilizing systematic prompt engineering and metadata extraction. This pipeline constructs two complementary benchmarks: GeoHeight-Bench for relative height analysis, and a more challenging GeoHeight-Bench+ for holistic, terrain-aware reasoning. Furthermore, to validate the necessity of height perception, we propose GeoHeightChat, the first height-aware remote sensing LMM baseline. Serving as a strong proof of concept, our baseline demonstrates that synergizing visual semantics with implicitly injected height geometric features effectively mitigates the "vertical blind spot", successfully unlocking a new paradigm of interactive height reasoning in existing optical models.
CVApr 30
Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change DetectionAli Shibli, Andrea Nascetti, Yifang Ban
Semantic segmentation and change detection are two fundamental challenges in remote sensing, requiring models to capture either spatial semantics or temporal differences from satellite imagery. Existing deep learning models often struggle with temporal inconsistencies or in capturing fine-grained spatial structures, require extensive pretraining, and offer limited interpretability - especially in real-world remote sensing scenarios. Recent advances in diffusion models show that Gaussian noise can be systematically leveraged to learn expressive data representations through denoising. Motivated by this, we investigate whether the noise process in diffusion models can be effectively utilized for discriminative tasks. We propose Noise2Map, a unified diffusion-based framework that repurposes the denoising process for fast, end-to-end discriminative learning. Unlike prior work that uses diffusion only for generation or feature extraction, Noise2Map directly predicts semantic or change maps using task-specific noise schedules and timestep conditioning, avoiding the costly sampling procedures of traditional diffusion models. The model is pretrained via self-supervised denoising and fine-tuned with supervision, enabling both interpretability and robustness. Our architecture supports both tasks (SS and CD) through a shared backbone and task-specific noise schedulers. Extensive evaluations on the SpaceNet7, WHU, and xView2 buildings damaged by wildfires datasets demonstrate that Noise2Map ranks on average 1st among seven models on semantic segmentation and 1st on change detection by a cross-dataset rank metric (average F1 primary, IoU tie-break). Ablation studies highlight the robustness of our model against different training noise schedulers and timestep control in the diffusion process, as well as the ability of the model to perform multi-task learning.
CVDec 16, 2024
TS-SatFire: A Multi-Task Satellite Image Time-Series Dataset for Wildfire Detection and PredictionYu Zhao, Sebastian Gerard, Yifang Ban
Wildfire monitoring and prediction are essential for understanding wildfire behaviour. With extensive Earth observation data, these tasks can be integrated and enhanced through multi-task deep learning models. We present a comprehensive multi-temporal remote sensing dataset for active fire detection, daily wildfire monitoring, and next-day wildfire prediction. Covering wildfire events in the contiguous U.S. from January 2017 to October 2021, the dataset includes 3552 surface reflectance images and auxiliary data such as weather, topography, land cover, and fuel information, totalling 71 GB. The lifecycle of each wildfire is documented, with labels for active fires (AF) and burned areas (BA), supported by manual quality assurance of AF and BA test labels. The dataset supports three tasks: a) active fire detection, b) daily burned area mapping, and c) wildfire progression prediction. Detection tasks use pixel-wise classification of multi-spectral, multi-temporal images, while prediction tasks integrate satellite and auxiliary data to model fire dynamics. This dataset and its benchmarks provide a foundation for advancing wildfire research using deep learning.
CVDec 16, 2024
RADARSAT Constellation Mission Compact Polarisation SAR Data for Burned Area Mapping with Deep LearningYu Zhao, Yifang Ban
Monitoring wildfires has become increasingly critical due to the sharp rise in wildfire incidents in recent years. Optical satellites like Sentinel-2 and Landsat are extensively utilized for mapping burned areas. However, the effectiveness of optical sensors is compromised by clouds and smoke, which obstruct the detection of burned areas. Thus, satellites equipped with Synthetic Aperture Radar (SAR), such as dual-polarization Sentinel-1 and quad-polarization RADARSAT-1/-2 C-band SAR, which can penetrate clouds and smoke, are investigated for mapping burned areas. However, there is limited research on using compact polarisation (compact-pol) C-band RADARSAT Constellation Mission (RCM) SAR data for this purpose. This study aims to investigate the capacity of compact polarisation RCM data for burned area mapping through deep learning. Compact-pol m-chi decomposition and Compact-pol Radar Vegetation Index (CpRVI) are derived from the RCM Multi-look Complex product. A deep-learning-based processing pipeline incorporating ConvNet-based and Transformer-based models is applied for burned area mapping, with three different input settings: using only log-ratio dual-polarization intensity images images, using only compact-pol decomposition plus CpRVI, and using all three data sources. The results demonstrate that compact-pol m-chi decomposition and CpRVI images significantly complement log-ratio images for burned area mapping. The best-performing Transformer-based model, UNETR, trained with log-ratio, m-chi decomposition, and CpRVI data, achieved an F1 Score of 0.718 and an IoU Score of 0.565, showing a notable improvement compared to the same model trained using only log-ratio images.
CVJun 25, 2024
Continuous Urban Change Detection from Satellite Image Time Series with Temporal Feature Refinement and Multi-Task IntegrationSebastian Hafner, Heng Fang, Hossein Azizpour et al.
Urbanization advances at unprecedented rates, leading to negative environmental and societal impacts. Remote sensing can help mitigate these effects by supporting sustainable development strategies with accurate information on urban growth. Deep learning-based methods have achieved promising urban change detection results from optical satellite image pairs using convolutional neural networks (ConvNets), transformers, and a multi-task learning setup. However, bi-temporal methods are limited for continuous urban change detection, i.e., the detection of changes in consecutive image pairs of satellite image time series (SITS), as they fail to fully exploit multi-temporal data (> 2 images). Existing multi-temporal change detection methods, on the other hand, collapse the temporal dimension, restricting their ability to capture continuous urban changes. Additionally, multi-task learning methods lack integration approaches that combine change and segmentation outputs. To address these challenges, we propose a continuous urban change detection framework incorporating two key modules. The temporal feature refinement (TFR) module employs self-attention to improve ConvNet-based multi-temporal building representations. The temporal dimension is preserved in the TFR module, enabling the detection of continuous changes. The multi-task integration (MTI) module utilizes Markov networks to find an optimal building map time series based on segmentation and dense change outputs. The proposed framework effectively identifies urban changes based on high-resolution SITS acquired by the PlanetScope constellation (F1 score 0.551), Gaofen-2 (F1 score 0.440), and WorldView-2 (F1 score 0.543). Moreover, our experiments on three challenging datasets demonstrate the effectiveness of the proposed framework compared to bi-temporal and multi-temporal urban change detection and segmentation methods.