10.0CVApr 21
Winner of CVPR2026 NTIRE Challenge on Image Shadow Removal: Semantic and Geometric Guidance for Shadow Removal via Cascaded RefinementLorenzo Beltrame, Jules Salzinger, Filip Svoboda et al.
We present a three-stage progressive shadow-removal pipeline for the CVPR2026 NTIRE WSRD+ challenge. Built on OmniSR, our method treats deshadowing as iterative direct refinement, where later stages correct residual artefacts left by earlier predictions. The model combines RGB appearance with frozen DINOv2 semantic guidance and geometric cues from monocular depth and surface normals, reused across all stages. To stabilise multi-stage optimisation, we introduce a contraction-constrained objective that encourages non-increasing reconstruction error across the cascade. A staged training pipeline transfers from earlier WSRD pretraining to WSRD+ supervision and final WSRD+ 2026 adaptation with cosine-annealed checkpoint ensembling. On the official WSRD+ 2026 hidden test set, our final ensemble achieved 26.680 PSNR, 0.8740 SSIM, 0.0578 LPIPS, and 26.135 FID, ranked first overall, and won the NTIRE 2026 Image Shadow Removal Challenge. The strong performance of the proposed model is further validated on the ISTD+ and UAV-SC+ datasets.
LGJul 24, 2024Code
EuroCropsML: A Time Series Benchmark Dataset For Few-Shot Crop Type ClassificationJoana Reuss, Jan Macdonald, Simon Becker et al.
We introduce EuroCropsML, an analysis-ready remote sensing machine learning dataset for time series crop type classification of agricultural parcels in Europe. It is the first dataset designed to benchmark transnational few-shot crop type classification algorithms that supports advancements in algorithmic development and research comparability. It comprises 706 683 multi-class labeled data points across 176 classes, featuring annual time series of per-parcel median pixel values from Sentinel-2 L1C data for 2021, along with crop type labels and spatial coordinates. Based on the open-source EuroCrops collection, EuroCropsML is publicly available on Zenodo.
LGOct 10, 2023
Harnessing Administrative Data Inventories to Create a Reliable Transnational Reference Database for Crop Type MonitoringMaja Schneider, Marco Körner
With leaps in machine learning techniques and their applicationon Earth observation challenges has unlocked unprecedented performance across the domain. While the further development of these methods was previously limited by the availability and volume of sensor data and computing resources, the lack of adequate reference data is now constituting new bottlenecks. Since creating such ground-truth information is an expensive and error-prone task, new ways must be devised to source reliable, high-quality reference data on large scales. As an example, we showcase E URO C ROPS, a reference dataset for crop type classification that aggregates and harmonizes administrative data surveyed in different countries with the goal of transnational interoperability.
LGOct 10, 2023
XAI for Early Crop ClassificationAyshah Chan, Maja Schneider, Marco Körner
We propose an approach for early crop classification through identifying important timesteps with eXplainable AI (XAI) methods. Our approach consists of training a baseline crop classification model to carry out layer-wise relevance propagation (LRP) so that the salient time step can be identified. We chose a selected number of such important time indices to create the bounding region of the shortest possible classification timeframe. We identified the period 21st April 2019 to 9th August 2019 as having the best trade-off in terms of accuracy and earliness. This timeframe only suffers a 0.75% loss in accuracy as compared to using the full timeseries. We observed that the LRP-derived important timesteps also highlight small details in input values that differentiates between different classes and
LGMay 28, 2019Code
BreizhCrops: A Time Series Dataset for Crop Type MappingMarc Rußwurm, Charlotte Pelletier, Maximilian Zollner et al.
We present Breizhcrops, a novel benchmark dataset for the supervised classification of field crops from satellite time series. We aggregated label data and Sentinel-2 top-of-atmosphere as well as bottom-of-atmosphere time series in the region of Brittany (Breizh in local language), north-east France. We compare seven recently proposed deep neural networks along with a Random Forest baseline. The dataset, model (re-)implementations and pre-trained model weights are available at the associated GitHub repository (https://github.com/dl4sits/BreizhCrops) that has been designed with applicability for practitioners in mind. We plan to maintain the repository with additional data and welcome contributions of novel methods to build a state-of-the-art benchmark on methods for crop type mapping.
CVOct 2, 2018Code
FutureGAN: Anticipating the Future Frames of Video Sequences using Spatio-Temporal 3d Convolutions in Progressively Growing GANsSandra Aigner, Marco Körner
We introduce a new encoder-decoder GAN model, FutureGAN, that predicts future frames of a video sequence conditioned on a sequence of past frames. During training, the networks solely receive the raw pixel values as an input, without relying on additional constraints or dataset specific conditions. To capture both the spatial and temporal components of a video sequence, spatio-temporal 3d convolutions are used in all encoder and decoder modules. Further, we utilize concepts of the existing progressively growing GAN (PGGAN) that achieves high-quality results on generating high-resolution single images. The FutureGAN model extends this concept to the complex task of video prediction. We conducted experiments on three different datasets, MovingMNIST, KTH Action, and Cityscapes. Our results show that the model learned representations to transform the information of an input sequence into a plausible future sequence effectively for all three datasets. The main advantage of the FutureGAN framework is that it is applicable to various different datasets without additional changes, whilst achieving stable results that are competitive to the state-of-the-art in video prediction. Our code is available at https://github.com/TUM-LMF/FutureGAN.
7.2CVMay 5
deSEO: Physics-Aware Dataset Creation for High-Resolution Satellite Image Shadow RemovalLorenzo Beltrame, Jules Salzinger, Filip Svoboda et al.
Shadows cast by terrain and tall structures remain a major obstacle for high-resolution satellite image analysis, degrading classification, detection, and 3D reconstruction performance. Public resources offering geometry-consistent paired shadow/shadow-free satellite imagery are essentially missing, and most Earth-observation datasets are designed for shadow detection or 3D modelling rather than removal. Existing deep shadow-removal datasets either target ground-level or aerial scenes or rely on unpaired and weakly supervised formulations rather than explicit satellite pairs. We address this gap with deSEO, a geometry-aware and physics-informed methodology that, to the best of our knowledge, is the first to derive paired supervision for satellite shadow removal from the S-EO shadow detection dataset through a fully replicable pipeline. For each tile, deSEO selects a minimally shadowed acquisition as a weak reference and pairs it with shadowed counterparts using temporal and geometric filtering, Jacobian-based orientation normalisation, and LoFTR-RANSAC registration. A per-pixel validity mask restricts learning to reliably aligned regions, enabling supervision despite residual off-nadir parallax. In addition to this paired dataset, we develop a DSM-aware deshadowing model that combines residual translation, perceptual objectives, and mask-constrained adversarial learning. In contrast, a direct adaptation of a UAV-based SRNet/pix2pix architecture fails to converge under satellite viewpoint variability. Our model consistently reduces the visual impact of cast shadows across diverse illumination and viewing conditions, achieving improved structural and perceptual fidelity on held-out scenes. deSEO therefore provides the first reproducible, geometry-aware paired dataset and baseline for shadow removal in satellite Earth observation.
LGNov 20, 2025
Mind the Gap: Bridging Prior Shift in Realistic Few-Shot Crop-Type ClassificationJoana Reuss, Ekaterina Gikalo, Marco Körner
Real-world agricultural distributions often suffer from severe class imbalance, typically following a long-tailed distribution. Labeled datasets for crop-type classification are inherently scarce and remain costly to obtain. When working with such limited data, training sets are frequently constructed to be artificially balanced -- in particular in the case of few-shot learning -- failing to reflect real-world conditions. This mismatch induces a shift between training and test label distributions, degrading real-world generalization. To address this, we propose Dirichlet Prior Augmentation (DirPA), a novel method that simulates an unknown label distribution skew of the target domain proactively during model training. Specifically, we model the real-world distribution as Dirichlet-distributed random variables, effectively performing a prior augmentation during few-shot learning. Our experiments show that DirPA successfully shifts the decision boundary and stabilizes the training process by acting as a dynamic feature regularizer.
LGApr 15, 2025
Benchmarking for Practice: Few-Shot Time-Series Crop-Type Classification on the EuroCropsML DatasetJoana Reuss, Jan Macdonald, Simon Becker et al.
Accurate crop-type classification from satellite time series is essential for agricultural monitoring. While various machine learning algorithms have been developed to enhance performance on data-scarce tasks, their evaluation often lacks real-world scenarios. Consequently, their efficacy in challenging practical applications has not yet been profoundly assessed. To facilitate future research in this domain, we present the first comprehensive benchmark for evaluating supervised and SSL methods for crop-type classification under real-world conditions. This benchmark study relies on the EuroCropsML time-series dataset, which combines farmer-reported crop data with Sentinel-2 satellite observations from Estonia, Latvia, and Portugal. Our findings indicate that MAML-based meta-learning algorithms achieve slightly higher accuracy compared to supervised transfer learning and SSL methods. However, compared to simpler transfer learning, the improvement of meta-learning comes at the cost of increased computational demands and training time. Moreover, supervised methods benefit most when pre-trained and fine-tuned on geographically close regions. In addition, while SSL generally lags behind meta-learning, it demonstrates advantages over training from scratch, particularly in capturing fine-grained features essential for real-world crop-type classification, and also surpasses standard transfer learning. This highlights its practical value when labeled pre-training crop data is scarce. Our insights underscore the trade-offs between accuracy and computational demand in selecting supervised machine learning methods for real-world crop-type classification tasks and highlight the difficulties of knowledge transfer across diverse geographic regions. Furthermore, they demonstrate the practical value of SSL approaches when labeled pre-training crop data is scarce.
IVJun 14, 2021
EuroCrops: A Pan-European Dataset for Time Series Crop Type ClassificationMaja Schneider, Amelie Broszeit, Marco Körner
We present EuroCrops, a dataset based on self-declared field annotations for training and evaluating methods for crop type classification and mapping, together with its process of acquisition and harmonisation. By this, we aim to enrich the research efforts and discussion for data-driven land cover classification via Earth observation and remote sensing. Additionally, through inclusion of self-declarations gathered in the scope of subsidy control from all countries of the European Union (EU), this dataset highlights the difficulties and pitfalls one comes across when operating on a transnational level. We, therefore, also introduce a new taxonomy scheme, HCAT-ID, that aspires to capture all the aspects of reference data originating from administrative and agency databases. To address researchers from both the remote sensing and the computer vision and machine learning communities, we publish the dataset in different formats and processing levels.
CVNov 23, 2020
Multi-task Learning for Human Settlement Extent Regression and Local Climate Zone ClassificationChunping Qiu, Lukas Liebel, Lloyd H. Hughes et al.
Human Settlement Extent (HSE) and Local Climate Zone (LCZ) maps are both essential sources, e.g., for sustainable urban development and Urban Heat Island (UHI) studies. Remote sensing (RS)- and deep learning (DL)-based classification approaches play a significant role by providing the potential for global mapping. However, most of the efforts only focus on one of the two schemes, usually on a specific scale. This leads to unnecessary redundancies, since the learned features could be leveraged for both of these related tasks. In this letter, the concept of multi-task learning (MTL) is introduced to HSE regression and LCZ classification for the first time. We propose a MTL framework and develop an end-to-end Convolutional Neural Network (CNN), which consists of a backbone network for shared feature learning, attention modules for task-specific feature learning, and a weighting strategy for balancing the two tasks. We additionally propose to exploit HSE predictions as a prior for LCZ classification to enhance the accuracy. The MTL approach was extensively tested with Sentinel-2 data of 13 cities across the world. The results demonstrate that the framework is able to provide a competitive solution for both tasks.
LGApr 28, 2020
Meta-Learning for Few-Shot Land Cover ClassificationMarc Rußwurm, Sherrie Wang, Marco Körner et al.
The representations of the Earth's surface vary from one geographic region to another. For instance, the appearance of urban areas differs between continents, and seasonality influences the appearance of vegetation. To capture the diversity within a single category, like as urban or vegetation, requires a large model capacity and, consequently, large datasets. In this work, we propose a different perspective and view this diversity as an inductive transfer learning problem where few data samples from one region allow a model to adapt to an unseen region. We evaluate the model-agnostic meta-learning (MAML) algorithm on classification and segmentation tasks using globally and regionally distributed datasets. We find that few-shot model adaptation outperforms pre-training with regular gradient descent and fine-tuning on (1) the Sen12MS dataset and (2) DeepGlobe data when the source domain and target domain differ. This indicates that model optimization with meta-learning may benefit tasks in the Earth sciences whose data show a high degree of diversity from region to region, while traditional gradient-based supervised learning remains suitable in the absence of a feature or label shift.
CVApr 6, 2020
A Generalized Multi-Task Learning Approach to Stereo DSM Filtering in Urban AreasLukas Liebel, Ksenia Bittner, Marco Körner
City models and height maps of urban areas serve as a valuable data source for numerous applications, such as disaster management or city planning. While this information is not globally available, it can be substituted by digital surface models (DSMs), automatically produced from inexpensive satellite imagery. However, stereo DSMs often suffer from noise and blur. Furthermore, they are heavily distorted by vegetation, which is of lesser relevance for most applications. Such basic models can be filtered by convolutional neural networks (CNNs), trained on labels derived from digital elevation models (DEMs) and 3D city models, in order to obtain a refined DSM. We propose a modular multi-task learning concept that consolidates existing approaches into a generalized framework. Our encoder-decoder models with shared encoders and multiple task-specific decoders leverage roof type classification as a secondary task and multiple objectives including a conditional adversarial term. The contributing single-objective losses are automatically weighted in the final multi-task loss function based on learned uncertainty estimates. We evaluated the performance of specific instances of this family of network architectures. Our method consistently outperforms the state of the art on common data, both quantitatively and qualitatively, and generalizes well to a new dataset of an independent study area.
LGOct 23, 2019
Self-attention for raw optical Satellite Time Series ClassificationMarc Rußwurm, Marco Körner
The amount of available Earth observation data has increased dramatically in the recent years. Efficiently making use of the entire body information is a current challenge in remote sensing and demands for light-weight problem-agnostic models that do not require region- or problem-specific expert knowledge. End-to-end trained deep learning models can make use of raw sensory data by learning feature extraction and classification in one step solely from data. Still, many methods proposed in remote sensing research require implicit feature extraction through data preprocessing or explicit design of features. In this work, we compare recent deep learning models on crop type classification on raw and preprocessed Sentinel 2 data. We concentrate on the common neural network architectures for time series, i.e., 1D-convolutions, recurrence, a shallow random forest baseline, and focus on the novel self-attention architecture. Our central findings are that data preprocessing still increased the overall classification performance for all models while the choice of model was less crucial. Self-attention and recurrent neural networks, by their architecture, outperformed convolutional neural networks on raw satellite time series. We explore this by a feature importance analysis based on gradient back-propagation that exploits the differentiable nature of deep learning models. Further, we qualitatively show how self-attention scores focus selectively on few classification-relevant observations.
CVSep 24, 2019
Enhancing Traffic Scene Predictions with Generative Adversarial NetworksPeter König, Sandra Aigner, Marco Körner
We present a new two-stage pipeline for predicting frames of traffic scenes where relevant objects can still reliably be detected. Using a recent video prediction network, we first generate a sequence of future frames based on past frames. A second network then enhances these frames in order to make them appear more realistic. This ensures the quality of the predicted frames to be sufficient to enable accurate detection of objects, which is especially important for autonomously driving cars. To verify this two-stage approach, we conducted experiments on the Cityscapes dataset. For enhancing, we trained two image-to-image translation methods based on generative adversarial networks, one for blind motion deblurring and one for image super-resolution. All resulting predictions were quantitatively evaluated using both traditional metrics and a state-of-the-art object detection network showing that the enhanced frames appear qualitatively improved. While the traditional image comparison metrics, i.e., MSE, PSNR, and SSIM, failed to confirm this visual impression, the object detection evaluation resembles it well. The best performing prediction-enhancement pipeline is able to increase the average precision values for detecting cars by about 9% for each prediction step, compared to the non-enhanced predictions.
LGAug 27, 2019
Early Classification for Agricultural Monitoring from Satellite Time SeriesMarc Rußwurm, Romain Tavenard, Sébastien Lefèvre et al.
In this work, we introduce a recently developed early classification mechanism to satellite-based agricultural monitoring. It augments existing classification models by an additional stopping probability based on the previously seen information. This mechanism is end-to-end trainable and derives its stopping decision solely from the observed satellite data. We show results on field parcels in central Europe where sufficient ground truth data is available for an empiric evaluation of the results with local phenological information obtained from authorities. We observe that the recurrent neural network outfitted with this early classification mechanism was able to distinguish the many of the crop types before the end of the vegetative period. Further, we associated these stopping times with evaluated ground truth information and saw that the times of classification were related to characteristic events of the observed plants' phenology.
CVJul 25, 2019
MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and ClassificationLukas Liebel, Marco Körner
We introduce MultiDepth, a novel training strategy and convolutional neural network (CNN) architecture that allows approaching single-image depth estimation (SIDE) as a multi-task problem. SIDE is an important part of road scene understanding. It, thus, plays a vital role in advanced driver assistance systems and autonomous vehicles. Best results for the SIDE task so far have been achieved using deep CNNs. However, optimization of regression problems, such as estimating depth, is still a challenging task. For the related tasks of image classification and semantic segmentation, numerous CNN-based methods with robust training behavior have been proposed. Hence, in order to overcome the notorious instability and slow convergence of depth value regression during training, MultiDepth makes use of depth interval classification as an auxiliary task. The auxiliary task can be disabled at test-time to predict continuous depth values using the main regression branch more efficiently. We applied MultiDepth to road scenes and present results on the KITTI depth prediction dataset. In experiments, we were able to show that end-to-end multi-task learning with both, regression and classification, is able to considerably improve training and yield more accurate results.
CVApr 22, 2019
Late or Earlier Information Fusion from Depth and Spectral Data? Large-Scale Digital Surface Model Refinement by Hybrid-cGANKsenia Bittner, Marco Körner, Peter Reinartz
We present the workflow of a DSM refinement methodology using a Hybrid-cGAN where the generative part consists of two encoders and a common decoder which blends the spectral and height information within one network. The inputs to the Hybrid-cGAN are single-channel photogrammetric DSMs with continuous values and single-channel pan-chromatic (PAN) half-meter resolution satellite images. Experimental results demonstrate that the earlier information fusion from data with different physical meanings helps to propagate fine details and complete an inaccurate or missing 3D information about building forms. Moreover, it improves the building boundaries making them more rectilinear.
CVMar 8, 2019
DSM Building Shape Refinement from Combined Remote Sensing Images based on Wnet-cGANsKsenia Bittner, Marco Körner, Peter Reinartz
We describe the workflow of a digital surface models (DSMs) refinement algorithm using a hybrid conditional generative adversarial network (cGAN) where the generative part consists of two parallel networks merged at the last stage forming a WNet architecture. The inputs to the so-called WNet-cGAN are stereo DSMs and panchromatic (PAN) half-meter resolution satellite images. Fusing these helps to propagate fine detailed information from a spectral image and complete the missing 3D knowledge from a stereo DSM about building shapes. Besides, it refines the building outlines and edges making them more rectangular and sharp.
CVOct 28, 2018
Convolutional LSTMs for Cloud-Robust Segmentation of Remote Sensing ImageryMarc Rußwurm, Marco Körner
Clouds frequently cover the Earth's surface and pose an omnipresent challenge to optical Earth observation methods. The vast majority of remote sensing approaches either selectively choose single cloud-free observations or employ a pre-classification strategy to identify and mask cloudy pixels. We follow a different strategy and treat cloud coverage as noise that is inherent to the observed satellite data. In prior work, we directly employed a straightforward \emph{convolutional long short-term memory} network for vegetation classification without explicit cloud filtering and achieved state-of-the-art classification accuracies. In this work, we investigate this cloud-robustness further by visualizing internal cell activations and performing an ablation experiment on datasets of different cloud coverage. In the visualizations of network states, we identified some cells in which modulation and input gates closed on cloudy pixels. This indicates that the network has internalized a cloud-filtering mechanism without being specifically trained on cloud labels. Overall, our results question the necessity of sophisticated pre-processing pipelines for multi-temporal deep learning approaches.
CVJul 7, 2018
Towards Multi-class Object Detection in Unconstrained Remote Sensing ImagerySeyed Majid Azimi, Eleonora Vig, Reza Bahmanyar et al.
Automatic multi-class object detection in remote sensing images in unconstrained scenarios is of high interest for several applications including traffic monitoring and disaster management. The huge variation in object scale, orientation, category, and complex backgrounds, as well as the different camera sensors pose great challenges for current algorithms. In this work, we propose a new method consisting of a novel joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features. These features are fed into rotation-based region proposal and region of interest networks to produce object detections. Finally, rotational non-maximum suppression is applied to remove redundant detections. During training, we minimize joint horizontal and oriented bounding box loss functions, as well as a novel loss that enforces oriented boxes to be rectangular. Our method achieves 68.16% mAP on horizontal and 72.45% mAP on oriented bounding box detection tasks on the challenging DOTA dataset, outperforming all published methods by a large margin (+6% and +12% absolute improvement, respectively). Furthermore, it generalizes to two other datasets, NWPU VHR-10 and UCAS-AOD, and achieves competitive results with the baselines even when trained on DOTA. Our method can be deployed in multi-class object detection applications, regardless of the image and object scales and orientations, making it a great choice for unconstrained aerial and satellite imagery.
CVMay 16, 2018
Auxiliary Tasks in Multi-task LearningLukas Liebel, Marco Körner
Multi-task convolutional neural networks (CNNs) have shown impressive results for certain combinations of tasks, such as single-image depth estimation (SIDE) and semantic segmentation. This is achieved by pushing the network towards learning a robust representation that generalizes well to different atomic tasks. We extend this concept by adding auxiliary tasks, which are of minor relevance for the application, to the set of learned tasks. As a kind of additional regularization, they are expected to boost the performance of the ultimately desired main tasks. To study the proposed approach, we picked vision-based road scene understanding (RSU) as an exemplary application. Since multi-task learning requires specialized datasets, particularly when using extensive sets of tasks, we provide a multi-modal dataset for multi-task RSU, called synMT. More than 2.5 $\cdot$ 10^5 synthetic images, annotated with 21 different labels, were acquired from the video game Grand Theft Auto V (GTA V). Our proposed deep multi-task CNN architecture was trained on various combination of tasks using synMT. The experiments confirmed that auxiliary tasks can indeed boost network performance, both in terms of final results and training time.
CVMay 3, 2018
Evaluation of CNN-based Single-Image Depth Estimation MethodsTobias Koch, Lukas Liebel, Friedrich Fraundorfer et al.
While an increasing interest in deep models for single-image depth estimation methods can be observed, established schemes for their evaluation are still limited. We propose a set of novel quality criteria, allowing for a more detailed analysis by focusing on specific characteristics of depth maps. In particular, we address the preservation of edges and planar regions, depth consistency, and absolute distance accuracy. In order to employ these metrics to evaluate and compare state-of-the-art single-image depth estimation approaches, we provide a new high-quality RGB-D dataset. We used a DSLR camera together with a laser scanner to acquire high-resolution images and highly accurate depth maps. Experimental results show the validity of our proposed evaluation protocol.
CVMar 19, 2018
Aerial LaneNet: Lane Marking Semantic Segmentation in Aerial Imagery using Wavelet-Enhanced Cost-sensitive Symmetric Fully Convolutional Neural NetworksSeyed Majid Azimi, Peter Fischer, Marco Körner et al.
The knowledge about the placement and appearance of lane markings is a prerequisite for the creation of maps with high precision, necessary for autonomous driving, infrastructure monitoring, lane-wise traffic management, and urban planning. Lane markings are one of the important components of such maps. Lane markings convey the rules of roads to drivers. While these rules are learned by humans, an autonomous driving vehicle should be taught to learn them to localize itself. Therefore, accurate and reliable lane marking semantic segmentation in the imagery of roads and highways is needed to achieve such goals. We use airborne imagery which can capture a large area in a short period of time by introducing an aerial lane marking dataset. In this work, we propose a Symmetric Fully Convolutional Neural Network enhanced by Wavelet Transform in order to automatically carry out lane marking segmentation in aerial imagery. Due to a heavily unbalanced problem in terms of number of lane marking pixels compared with background pixels, we use a customized loss function as well as a new type of data augmentation step. We achieve a very high accuracy in pixel-wise localization of lane markings without using 3rd-party information. In this work, we introduce the first high-quality dataset used within our experiments which contains a broad range of situations and classes of lane markings representative of current transportation systems. This dataset will be publicly available and hence, it can be used as the benchmark dataset for future algorithms within this domain.
CVFeb 25, 2018
Building Instance Classification Using Street View ImagesJian Kang, Marco Körner, Yuanyuan Wang et al.
Land-use classification based on spaceborne or aerial remote sensing images has been extensively studied over the past decades. Such classification is usually a patch-wise or pixel-wise labeling over the whole image. But for many applications, such as urban population density mapping or urban utility planning, a classification map based on individual buildings is much more informative. However, such semantic classification still poses some fundamental challenges, for example, how to retrieve fine boundaries of individual buildings. In this paper, we proposed a general framework for classifying the functionality of individual buildings. The proposed method is based on Convolutional Neural Networks (CNNs) which classify facade structures from street view images, such as Google StreetView, in addition to remote sensing images which usually only show roof structures. Geographic information was utilized to mask out individual buildings, and to associate the corresponding street view images. We created a benchmark dataset which was used for training and evaluating CNNs. In addition, the method was applied to generate building classification maps on both region and city scales of several cities in Canada and the US. Keywords: CNN, Building instance classification, Street view images, OpenStreetMap
CVFeb 6, 2018
Multi-Temporal Land Cover Classification with Sequential Recurrent EncodersMarc Rußwurm, Marco Körner
Earth observation (EO) sensors deliver data with daily or weekly temporal resolution. Most land use and land cover (LULC) approaches, however, expect cloud-free and mono-temporal observations. The increasing temporal capabilities of today's sensors enables the use of temporal, along with spectral and spatial features. Domains, such as speech recognition or neural machine translation, work with inherently temporal data and, today, achieve impressive results using sequential encoder-decoder structures. Inspired by these sequence-to-sequence models, we adapt an encoder structure with convolutional recurrent layers in order to approximate a phenological model for vegetation classes based on a temporal sequence of Sentinel 2 (S2) images. In our experiments, we visualize internal activations over a sequence of cloudy and non-cloudy images and find several recurrent cells, which reduce the input activity for cloudy observations. Hence, we assume that our network has learned cloud-filtering schemes solely from input data, which could alleviate the need for tedious cloud-filtering as a preprocessing step for many EO approaches. Moreover, using unfiltered temporal series of top-of-atmosphere (TOA) reflectance data, we achieved in our experiments state-of-the-art classification accuracies on a large number of crop classes with minimal preprocessing compared to other classification approaches.