CVApr 11, 2023
UnCRtainTS: Uncertainty Quantification for Cloud Removal in Optical Satellite Time SeriesPatrick Ebel, Vivien Sainte Fare Garnot, Michael Schmitt et al.
Clouds and haze often occlude optical satellite images, hindering continuous, dense monitoring of the Earth's surface. Although modern deep learning methods can implicitly learn to ignore such occlusions, explicit cloud removal as pre-processing enables manual interpretation and allows training models when only few annotations are available. Cloud removal is challenging due to the wide range of occlusion scenarios -- from scenes partially visible through haze, to completely opaque cloud coverage. Furthermore, integrating reconstructed images in downstream applications would greatly benefit from trustworthy quality assessment. In this paper, we introduce UnCRtainTS, a method for multi-temporal cloud removal combining a novel attention-based architecture, and a formulation for multivariate uncertainty prediction. These two components combined set a new state-of-the-art performance in terms of image reconstruction on two public cloud removal datasets. Additionally, we show how the well-calibrated predicted uncertainties enable a precise control of the reconstruction quality.
CVSep 26, 2022
EOD: The IEEE GRSS Earth Observation DatabaseMichael Schmitt, Pedram Ghamisi, Naoto Yokoya et al.
In the era of deep learning, annotated datasets have become a crucial asset to the remote sensing community. In the last decade, a plethora of different datasets was published, each designed for a specific data type and with a specific task or application in mind. In the jungle of remote sensing datasets, it can be hard to keep track of what is available already. With this paper, we introduce EOD - the IEEE GRSS Earth Observation Database (EOD) - an interactive online platform for cataloguing different types of datasets leveraging remote sensing imagery.
CVOct 12, 2023
A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning ApproachesKangqing Shen, Gemine Vivone, Xiaoyuan Yang et al.
Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is still in its early stages, and many limitations can be highlighted. In this paper, we propose a full research line for supervised learning-based approaches to SAR colorization. Our approach includes a protocol for generating synthetic color SAR images, several baselines, and an effective method based on the conditional generative adversarial network (cGAN) for SAR colorization. We also propose numerical assessment metrics for the problem at hand. To our knowledge, this is the first attempt to propose a research line for SAR colorization that includes a protocol, a benchmark, and a complete performance evaluation. Our extensive tests demonstrate the effectiveness of our proposed cGAN-based network for SAR colorization. The code will be made publicly available.
CVAug 14, 2023Code
Probabilistic MIMO U-Net: Efficient and Accurate Uncertainty Estimation for Pixel-wise RegressionAnton Baumann, Thomas Roßberg, Michael Schmitt
Uncertainty estimation in machine learning is paramount for enhancing the reliability and interpretability of predictive models, especially in high-stakes real-world scenarios. Despite the availability of numerous methods, they often pose a trade-off between the quality of uncertainty estimation and computational efficiency. Addressing this challenge, we present an adaptation of the Multiple-Input Multiple-Output (MIMO) framework -- an approach exploiting the overparameterization of deep neural networks -- for pixel-wise regression tasks. Our MIMO variant expands the applicability of the approach from simple image classification to broader computer vision domains. For that purpose, we adapted the U-Net architecture to train multiple subnetworks within a single model, harnessing the overparameterization in deep neural networks. Additionally, we introduce a novel procedure for synchronizing subnetwork performance within the MIMO framework. Our comprehensive evaluations of the resulting MIMO U-Net on two orthogonal datasets demonstrate comparable accuracy to existing models, superior calibration on in-distribution data, robust out-of-distribution detection capabilities, and considerable improvements in parameter size and inference time. Code available at github.com/antonbaumann/MIMO-Unet
CVOct 30, 2023
There Are No Data Like More Data- Datasets for Deep Learning in Earth ObservationMichael Schmitt, Seyed Ali Ahmadi, Yonghao Xu et al.
Carefully curated and annotated datasets are the foundation of machine learning, with particularly data-hungry deep neural networks forming the core of what is often called Artificial Intelligence (AI). Due to the massive success of deep learning applied to Earth Observation (EO) problems, the focus of the community has been largely on the development of ever-more sophisticated deep neural network architectures and training strategies largely ignoring the overall importance of datasets. For that purpose, numerous task-specific datasets have been created that were largely ignored by previously published review articles on AI for Earth observation. With this article, we want to change the perspective and put machine learning datasets dedicated to Earth observation data and applications into the spotlight. Based on a review of the historical developments, currently available resources are described and a perspective for future developments is formed. We hope to contribute to an understanding that the nature of our data is what distinguishes the Earth observation community from many other communities that apply deep learning techniques to image data, and that a detailed understanding of EO data peculiarities is among the core competencies of our discipline.
CVDec 5, 2022
MapInWild: A Remote Sensing Dataset to Address the Question What Makes Nature WildBurak Ekim, Timo T. Stomberg, Ribana Roscher et al.
Antrophonegic pressure (i.e. human influence) on the environment is one of the largest causes of the loss of biological diversity. Wilderness areas, in contrast, are home to undisturbed ecological processes. However, there is no biophysical definition of the term wilderness. Instead, wilderness is more of a philosophical or cultural concept and thus cannot be easily delineated or categorized in a technical manner. With this paper, (i) we introduce the task of wilderness mapping by means of machine learning applied to satellite imagery (ii) and publish MapInWild, a large-scale benchmark dataset curated for that task. MapInWild is a multi-modal dataset and comprises various geodata acquired and formed from a diverse set of Earth observation sensors. The dataset consists of 8144 images with a shape of 1920 x 1920 pixels and is approximately 350 GB in size. The images are weakly annotated with three classes derived from the World Database of Protected Areas - Strict Nature Reserves, Wilderness Areas, and National Parks. With the dataset, which shall serve as a testbed for developments in fields such as explainable machine learning and environmental remote sensing, we hope to contribute to a deepening of our understanding of the question "What makes nature wild?".
CVApr 5, 2023
Explaining Multimodal Data Fusion: Occlusion Analysis for Wilderness MappingBurak Ekim, Michael Schmitt
Jointly harnessing complementary features of multi-modal input data in a common latent space has been found to be beneficial long ago. However, the influence of each modality on the models decision remains a puzzle. This study proposes a deep learning framework for the modality-level interpretation of multimodal earth observation data in an end-to-end fashion. While leveraging an explainable machine learning method, namely Occlusion Sensitivity, the proposed framework investigates the influence of modalities under an early-fusion scenario in which the modalities are fused before the learning process. We show that the task of wilderness mapping largely benefits from auxiliary data such as land cover and night time light data.
CVAug 20, 2024
SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised PretrainingJonathan Prexl, Michael Schmitt
This paper introduces SenPa-MAE, a transformer architecture that encodes the sensor parameters of an observed multispectral signal into the image embeddings. SenPa-MAE can be pre-trained on imagery of different satellites with non-matching spectral or geometrical sensor characteristics. To incorporate sensor parameters, we propose a versatile sensor parameter encoding module as well as a data augmentation strategy for the diversification of the pre-training dataset. This enables the model to effectively differentiate between various sensors and gain an understanding of sensor parameters and the correlation to the observed signal. Given the rising number of Earth observation satellite missions and the diversity in their sensor specifications, our approach paves the way towards a sensor-independent Earth observation foundation model. This opens up possibilities such as cross-sensor training and sensor-independent inference.
CVDec 18, 2024Code
Distribution Shifts at Scale: Out-of-distribution Detection in Earth ObservationBurak Ekim, Girmaw Abebe Tadesse, Caleb Robinson et al.
Training robust deep learning models is crucial in Earth Observation, where globally deployed models often face distribution shifts that degrade performance, especially in low-data regions. Out-of-distribution (OOD) detection addresses this by identifying inputs that deviate from in-distribution (ID) data. However, existing methods either assume access to OOD data or compromise primary task performance, limiting real-world use. We introduce TARDIS, a post-hoc OOD detection method designed for scalable geospatial deployment. Our core innovation lies in generating surrogate distribution labels by leveraging ID data within the feature space. TARDIS takes a pre-trained model, ID data, and data from an unknown distribution (WILD), separates WILD into surrogate ID and OOD labels based on internal activations, and trains a binary classifier to detect distribution shifts. We validate on EuroSAT and xBD across 17 setups covering covariate and semantic shifts, showing near-upper-bound surrogate labeling performance in 13 cases and matching the performance of top post-hoc activation- and scoring-based methods. Finally, deploying TARDIS on Fields of the World reveals actionable insights into pre-trained model behavior at scale. The code is available at \href{https://github.com/microsoft/geospatial-ood-detection}{https://github.com/microsoft/geospatial-ood-detection}
76.1CVApr 8Code
Location Is All You Need: Continuous Spatiotemporal Neural Representations of Earth Observation DataMojgan Madadikhaljan, Jonathan Prexl, Isabelle Wittmann et al.
In this work, we present LIANet (Location Is All You Need Network), a coordinate-based neural representation that models multi-temporal spaceborne Earth observation (EO) data for a given region of interest as a continuous spatiotemporal neural field. Given only spatial and temporal coordinates, LIANet reconstructs the corresponding satellite imagery. Once pretrained, this neural representation can be adapted to various EO downstream tasks, such as semantic segmentation or pixel-wise regression, importantly, without requiring access to the original satellite data. LIANet intends to serve as a user-friendly alternative to Geospatial Foundation Models (GFMs) by eliminating the overhead of data access and preprocessing for end-users and enabling fine-tuning solely based on labels. We demonstrate the pretraining of LIANet across target areas of varying sizes and show that fine-tuning it for downstream tasks achieves competitive performance compared to training from scratch or using established GFMs. The source code and datasets are publicly available at https://github.com/mojganmadadi/LIANet/tree/v1.0.1.
CVNov 23, 2020Code
Synthesizing Optical and SAR Imagery From Land Cover Maps and Auxiliary Raster DataGerald Baier, Antonin Deschemps, Michael Schmitt et al.
We synthesize both optical RGB and synthetic aperture radar (SAR) remote sensing images from land cover maps and auxiliary raster data using generative adversarial networks (GANs). In remote sensing, many types of data, such as digital elevation models (DEMs) or precipitation maps, are often not reflected in land cover maps but still influence image content or structure. Including such data in the synthesis process increases the quality of the generated images and exerts more control on their characteristics. Spatially adaptive normalization layers fuse both inputs and are applied to a full-blown generator architecture consisting of encoder and decoder to take full advantage of the information content in the auxiliary raster data. Our method successfully synthesizes medium (10 m) and high (1 m) resolution images when trained with the corresponding data set. We show the advantage of data fusion of land cover maps and auxiliary information using mean intersection over unions (mIoUs), pixel accuracy, and Fréchet inception distances (FIDs) using pretrained U-Net segmentation models. Handpicked images exemplify how fusing information avoids ambiguities in the synthesized images. By slightly editing the input, our method can be used to synthesize realistic changes, i.e., raising the water levels. The source code is available at https://github.com/gbaier/rs_img_synth and we published the newly created high-resolution dataset at https://ieee-dataport.org/open-access/geonrw.
55.2CVApr 20
Prompting Foundation Models for Zero-Shot Ship Instance Segmentation in SAR ImageryIslam Mansour, Francescopaolo Sica, Michael Schmitt
Synthetic Aperture Radar (SAR) plays a critical role in maritime surveillance, yet deep learning for SAR analysis is limited by the lack of pixel-level annotations. This paper explores how general-purpose vision foundation models can enable zero-shot ship instance segmentation in SAR imagery, eliminating the need for pixel-level supervision. A YOLOv11-based detector trained on open SAR datasets localizes ships via bounding boxes, which then prompt the Segment Anything Model 2 (SAM2) to produce instance masks without any mask annotations. Unlike prior SAM-based SAR approaches that rely on fine tuning or adapters, our method demonstrates that spatial constraints from a SAR-trained detector alone can effectively regularize foundation model predictions. This design partially mitigates the optical-SAR domain gap and enables downstream applications such as vessel classification, size estimation, and wake analysis. Experiments on the SSDD benchmark achieve a mean IoU of 0.637 (89% of a fully supervised baseline) with an overall ship detection rate of 89.2%, confirming a scalable, annotation-efficient pathway toward foundation-model-driven SAR image understanding.
CVJun 12, 2025
HyBiomass: Global Hyperspectral Imagery Benchmark Dataset for Evaluating Geospatial Foundation Models in Forest Aboveground Biomass EstimationAaron Banze, Timothée Stassin, Nassim Ait Ali Braham et al.
Comprehensive evaluation of geospatial foundation models (Geo-FMs) requires benchmarking across diverse tasks, sensors, and geographic regions. However, most existing benchmark datasets are limited to segmentation or classification tasks, and focus on specific geographic areas. To address this gap, we introduce a globally distributed dataset for forest aboveground biomass (AGB) estimation, a pixel-wise regression task. This benchmark dataset combines co-located hyperspectral imagery (HSI) from the Environmental Mapping and Analysis Program (EnMAP) satellite and predictions of AGB density estimates derived from the Global Ecosystem Dynamics Investigation lidars, covering seven continental regions. Our experimental results on this dataset demonstrate that the evaluated Geo-FMs can match or, in some cases, surpass the performance of a baseline U-Net, especially when fine-tuning the encoder. We also find that the performance difference between the U-Net and Geo-FMs depends on the dataset size for each region and highlight the importance of the token patch size in the Vision Transformer backbone for accurate predictions in pixel-wise regression tasks. By releasing this globally distributed hyperspectral benchmark dataset, we aim to facilitate the development and evaluation of Geo-FMs for HSI applications. Leveraging this dataset additionally enables research into geographic bias and generalization capacity of Geo-FMs. The dataset and source code will be made publicly available.
CVApr 11, 2025
SARFormer -- An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar DataJonathan Prexl, Michael Recla, Michael Schmitt
This manuscript introduces SARFormer, a modified Vision Transformer (ViT) architecture designed for processing one or multiple synthetic aperture radar (SAR) images. Given the complex image geometry of SAR data, we propose an acquisition parameter encoding module that significantly guides the learning process, especially in the case of multiple images, leading to improved performance on downstream tasks. We further explore self-supervised pre-training, conduct experiments with limited labeled data, and benchmark our contribution and adaptations thoroughly in ablation experiments against a baseline, where the model is tested on tasks such as height reconstruction and segmentation. Our approach achieves up to 17% improvement in terms of RMSE over baseline models
CVApr 4, 2025
MIMRS: A Survey on Masked Image Modeling in Remote SensingShabnam Choudhury, Akhil Vasim, Michael Schmitt et al.
Masked Image Modeling (MIM) is a self-supervised learning technique that involves masking portions of an image, such as pixels, patches, or latent representations, and training models to predict the missing information using the visible context. This approach has emerged as a cornerstone in self-supervised learning, unlocking new possibilities in visual understanding by leveraging unannotated data for pre-training. In remote sensing, MIM addresses challenges such as incomplete data caused by cloud cover, occlusions, and sensor limitations, enabling applications like cloud removal, multi-modal data fusion, and super-resolution. By synthesizing and critically analyzing recent advancements, this survey (MIMRS) is a pioneering effort to chart the landscape of mask image modeling in remote sensing. We highlight state-of-the-art methodologies, applications, and future research directions, providing a foundational review to guide innovation in this rapidly evolving field.
CVJun 27, 2024
Mapping Land Naturalness from Sentinel-2 using Deep Contextual and Geographical PriorsBurak Ekim, Michael Schmitt
In recent decades, the causes and consequences of climate change have accelerated, affecting our planet on an unprecedented scale. This change is closely tied to the ways in which humans alter their surroundings. As our actions continue to impact natural areas, using satellite images to observe and measure these effects has become crucial for understanding and combating climate change. Aiming to map land naturalness on the continuum of modern human pressure, we have developed a multi-modal supervised deep learning framework that addresses the unique challenges of satellite data and the task at hand. We incorporate contextual and geographical priors, represented by corresponding coordinate information and broader contextual information, including and surrounding the immediate patch to be predicted. Our framework improves the model's predictive performance in mapping land naturalness from Sentinel-2 data, a type of multi-spectral optical satellite imagery. Recognizing that our protective measures are only as effective as our understanding of the ecosystem, quantifying naturalness serves as a crucial step toward enhancing our environmental stewardship.
CVJan 24, 2022
SEN12MS-CR-TS: A Remote Sensing Data Set for Multi-modal Multi-temporal Cloud RemovalPatrick Ebel, Yajin Xu, Michael Schmitt et al.
About half of all optical observations collected via spaceborne satellites are affected by haze or clouds. Consequently, cloud coverage affects the remote sensing practitioner's capabilities of a continuous and seamless monitoring of our planet. This work addresses the challenge of optical satellite image reconstruction and cloud removal by proposing a novel multi-modal and multi-temporal data set called SEN12MS-CR-TS. We propose two models highlighting the benefits and use cases of SEN12MS-CR-TS: First, a multi-modal multi-temporal 3D-Convolution Neural Network that predicts a cloud-free image from a sequence of cloudy optical and radar images. Second, a sequence-to-sequence translation model that predicts a cloud-free time series from a cloud-covered time series. Both approaches are evaluated experimentally, with their respective models trained and tested on SEN12MS-CR-TS. The conducted experiments highlight the contribution of our data set to the remote sensing community as well as the benefits of multi-modal and multi-temporal information to reconstruct noisy information. Our data set is available at https://patrickTUM.github.io/cloud_removal
CVNov 3, 2021
Deep-Learning-Based Single-Image Height Reconstruction from Very-High-Resolution SAR Intensity DataMichael Recla, Michael Schmitt
Originally developed in fields such as robotics and autonomous driving with image-based navigation in mind, deep learning-based single-image depth estimation (SIDE) has found great interest in the wider image analysis community. Remote sensing is no exception, as the possibility to estimate height maps from single aerial or satellite imagery bears great potential in the context of topographic reconstruction. A few pioneering investigations have demonstrated the general feasibility of single image height prediction from optical remote sensing images and motivate further studies in that direction. With this paper, we present the first-ever demonstration of deep learning-based single image height prediction for the other important sensor modality in remote sensing: synthetic aperture radar (SAR) data. Besides the adaptation of a convolutional neural network (CNN) architecture for SAR intensity images, we present a workflow for the generation of training data, and extensive experimental results for different SAR imaging modes and test sites. Since we put a particular emphasis on transferability, we are able to confirm that deep learning-based single-image height estimation is not only possible, but also transfers quite well to unseen data, even if acquired by different imaging modes and imaging parameters.
LGMay 25, 2021
There is no data like more data -- current status of machine learning datasets in remote sensingMichael Schmitt, Seyed Ali Ahmadi, Ronny Hänsch
Annotated datasets have become one of the most crucial preconditions for the development and evaluation of machine learning-based methods designed for the automated interpretation of remote sensing data. In this paper, we review the historic development of such datasets, discuss their features based on a few selected examples, and address open issues for future developments.
CVApr 1, 2021
Remote Sensing Image Classification with the SEN12MS DatasetMichael Schmitt, Yu-Lun Wu
Image classification is one of the main drivers of the rapid developments in deep learning with convolutional neural networks for computer vision. So is the analogous task of scene classification in remote sensing. However, in contrast to the computer vision community that has long been using well-established, large-scale standard datasets to train and benchmark high-capacity models, the remote sensing community still largely relies on relatively small and often application-dependend datasets, thus lacking comparability. With this letter, we present a classification-oriented conversion of the SEN12MS dataset. Using that, we provide results for several baseline models based on two standard CNN architectures and different input data configurations. Our results support the benchmarking of remote sensing image classification and provide insights to the benefit of multi-spectral data and multi-sensor data fusion over conventional RGB imagery.
CVNov 23, 2020
Multi-task Learning for Human Settlement Extent Regression and Local Climate Zone ClassificationChunping Qiu, Lukas Liebel, Lloyd H. Hughes et al.
Human Settlement Extent (HSE) and Local Climate Zone (LCZ) maps are both essential sources, e.g., for sustainable urban development and Urban Heat Island (UHI) studies. Remote sensing (RS)- and deep learning (DL)-based classification approaches play a significant role by providing the potential for global mapping. However, most of the efforts only focus on one of the two schemes, usually on a specific scale. This leads to unnecessary redundancies, since the learned features could be leveraged for both of these related tasks. In this letter, the concept of multi-task learning (MTL) is introduced to HSE regression and LCZ classification for the first time. We propose a MTL framework and develop an end-to-end Convolutional Neural Network (CNN), which consists of a backbone network for shared feature learning, attention modules for task-specific feature learning, and a weighting strategy for balancing the two tasks. We additionally propose to exploit HSE predictions as a prior for LCZ classification to enhance the accuracy. The MTL approach was extensively tested with Sentinel-2 data of 13 cities across the world. The results demonstrate that the framework is able to provide a competitive solution for both tasks.
IVSep 16, 2020
Multi-Sensor Data Fusion for Cloud Removal in Global and All-Season Sentinel-2 ImageryPatrick Ebel, Andrea Meraner, Michael Schmitt et al.
This work has been accepted by IEEE TGRS for publication. The majority of optical observations acquired via spaceborne earth imagery are affected by clouds. While there is numerous prior work on reconstructing cloud-covered information, previous studies are oftentimes confined to narrowly-defined regions of interest, raising the question of whether an approach can generalize to a diverse set of observations acquired at variable cloud coverage or in different regions and seasons. We target the challenge of generalization by curating a large novel data set for training new cloud removal approaches and evaluate on two recently proposed performance metrics of image quality and diversity. Our data set is the first publically available to contain a global sample of co-registered radar and optical observations, cloudy as well as cloud-free. Based on the observation that cloud coverage varies widely between clear skies and absolute coverage, we propose a novel model that can deal with either extremes and evaluate its performance on our proposed data set. Finally, we demonstrate the superiority of training models on real over synthetic data, underlining the need for a carefully curated data set of real observations. To facilitate future research, our data set is made available online
IVMay 16, 2020
Multi-level Feature Fusion-based CNN for Local Climate Zone Classification from Sentinel-2 Images: Benchmark Results on the So2Sat LCZ42 DatasetChunping Qiu, Xiaochong Tong, Michael Schmitt et al.
As a unique classification scheme for urban forms and functions, the local climate zone (LCZ) system provides essential general information for any studies related to urban environments, especially on a large scale. Remote sensing data-based classification approaches are the key to large-scale mapping and monitoring of LCZs. The potential of deep learning-based approaches is not yet fully explored, even though advanced convolutional neural networks (CNNs) continue to push the frontiers for various computer vision tasks. One reason is that published studies are based on different datasets, usually at a regional scale, which makes it impossible to fairly and consistently compare the potential of different CNNs for real-world scenarios. This study is based on the big So2Sat LCZ42 benchmark dataset dedicated to LCZ classification. Using this dataset, we studied a range of CNNs of varying sizes. In addition, we proposed a CNN to classify LCZs from Sentinel-2 images, Sen2LCZ-Net. Using this base network, we propose fusing multi-level features using the extended Sen2LCZ-Net-MF. With this proposed simple network architecture and the highly competitive benchmark dataset, we obtain results that are better than those obtained by the state-of-the-art CNNs, while requiring less computation with fewer layers and parameters. Large-scale LCZ classification examples of completely unseen areas are presented, demonstrating the potential of our proposed Sen2LCZ-Net-MF as well as the So2Sat LCZ42 dataset. We also intensively investigated the influence of network depth and width and the effectiveness of the design choices made for Sen2LCZ-Net-MF. Our work will provide important baselines for future CNN-based algorithm developments for both LCZ classification and other urban land cover land use classification.
CVFeb 19, 2020
Weakly Supervised Semantic Segmentation of Satellite Images for Land Cover Mapping -- Challenges and OpportunitiesMichael Schmitt, Jonathan Prexl, Patrick Ebel et al.
Fully automatic large-scale land cover mapping belongs to the core challenges addressed by the remote sensing community. Usually, the basis of this task is formed by (supervised) machine learning models. However, in spite of recent growth in the availability of satellite observations, accurate training data remains comparably scarce. On the other hand, numerous global land cover products exist and can be accessed often free-of-charge. Unfortunately, these maps are typically of a much lower resolution than modern day satellite imagery. Besides, they always come with a significant amount of noise, as they cannot be considered ground truth, but are products of previous (semi-)automatic prediction tasks. Therefore, this paper seeks to make a case for the application of weakly supervised learning strategies to get the most out of available data sources and achieve progress in high-resolution large-scale land cover mapping. Challenges and opportunities are discussed based on the SEN12MS dataset, for which also some baseline results are shown. These baselines indicate that there is still a lot of potential for dedicated approaches designed to deal with remote sensing-specific forms of weak supervision.
CVDec 19, 2019
So2Sat LCZ42: A Benchmark Dataset for Global Local Climate Zones ClassificationXiao Xiang Zhu, Jingliang Hu, Chunping Qiu et al.
Access to labeled reference data is one of the grand challenges in supervised machine learning endeavors. This is especially true for an automated analysis of remote sensing images on a global scale, which enables us to address global challenges such as urbanization and climate change using state-of-the-art machine learning techniques. To meet these pressing needs, especially in urban research, we provide open access to a valuable benchmark dataset named "So2Sat LCZ42," which consists of local climate zone (LCZ) labels of about half a million Sentinel-1 and Sentinel-2 image patches in 42 urban agglomerations (plus 10 additional smaller areas) across the globe. This dataset was labeled by 15 domain experts following a carefully designed labeling work flow and evaluation process over a period of six months. As rarely done in other labeled remote sensing dataset, we conducted rigorous quality assessment by domain experts. The dataset achieved an overall confidence of 85%. We believe this LCZ dataset is a first step towards an unbiased globallydistributed dataset for urban growth monitoring using machine learning methods, because LCZ provide a rather objective measure other than many other semantic land use and land cover classifications. It provides measures of the morphology, compactness, and height of urban areas, which are less dependent on human and culture. This dataset can be accessed from http://doi.org/10.14459/2018mp1483140.
CVJun 18, 2019
SEN12MS -- A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data FusionMichael Schmitt, Lloyd Haydn Hughes, Chunping Qiu et al.
The availability of curated large-scale training data is a crucial factor for the development of well-generalizing deep learning methods for the extraction of geoinformation from multi-sensor remote sensing imagery. While quite some datasets have already been published by the community, most of them suffer from rather strong limitations, e.g. regarding spatial coverage, diversity or simply number of available samples. Exploiting the freely available data acquired by the Sentinel satellites of the Copernicus program implemented by the European Space Agency, as well as the cloud computing facilities of Google Earth Engine, we provide a dataset consisting of 180,662 triplets of dual-pol synthetic aperture radar (SAR) image patches, multi-spectral Sentinel-2 image patches, and MODIS land cover maps. With all patches being fully georeferenced at a 10 m ground sampling distance and covering all inhabited continents during all meteorological seasons, we expect the dataset to support the community in developing sophisticated deep learning-based approaches for common tasks such as scene classification or semantic segmentation for land cover mapping.
CVJul 4, 2018
The SEN1-2 Dataset for Deep Learning in SAR-Optical Data FusionMichael Schmitt, Lloyd Haydn Hughes, Xiao Xiang Zhu
While deep learning techniques have an increasing impact on many technical fields, gathering sufficient amounts of training data is a challenging problem in remote sensing. In particular, this holds for applications involving data from multiple sensors with heterogeneous characteristics. One example for that is the fusion of synthetic aperture radar (SAR) data and optical imagery. With this paper, we publish the SEN1-2 dataset to foster deep learning research in SAR-optical data fusion. SEN1-2 comprises 282,384 pairs of corresponding image patches, collected from across the globe and throughout all meteorological seasons. Besides a detailed description of the dataset, we show exemplary results for several possible applications, such as SAR image colorization, SAR-optical image matching, and creation of artificial optical images from SAR input data. Since SEN1-2 is the first large open dataset of this kind, we believe it will support further developments in the field of deep learning for remote sensing as well as multi-sensor data fusion.
IVJan 25, 2018
Identifying Corresponding Patches in SAR and Optical Images with a Pseudo-Siamese CNNLloyd H. Hughes, Michael Schmitt, Lichao Mou et al.
In this letter, we propose a pseudo-siamese convolutional neural network (CNN) architecture that enables to solve the task of identifying corresponding patches in very-high-resolution (VHR) optical and synthetic aperture radar (SAR) remote sensing imagery. Using eight convolutional layers each in two parallel network streams, a fully connected layer for the fusion of the features learned in each stream, and a loss function based on binary cross-entropy, we achieve a one-hot indication if two patches correspond or not. The network is trained and tested on an automatically generated dataset that is based on a deterministic alignment of SAR and optical imagery via previously reconstructed and subsequently co-registered 3D point clouds. The satellite images, from which the patches comprising our dataset are extracted, show a complex urban scene containing many elevated objects (i.e. buildings), thus providing one of the most difficult experimental environments. The achieved results show that the network is able to predict corresponding patches with high accuracy, thus indicating great potential for further development towards a generalized multi-sensor key-point matching procedure. Index Terms-synthetic aperture radar (SAR), optical imagery, data fusion, deep learning, convolutional neural networks (CNN), image matching, deep matching