Sébastien Lefèvre

CV
h-index9
40papers
2,543citations
Novelty33%
AI Score54

40 Papers

CVApr 25, 2023Code
Change detection needs change information: improving deep 3D point cloud change detection

Iris de Gélis, Thomas Corpetti, Sébastien Lefèvre

Change detection is an important task that rapidly identifies modified areas, particularly when multi-temporal data are concerned. In landscapes with a complex geometry (e.g., urban environment), vertical information is a very useful source of knowledge that highlights changes and classifies them into different categories. In this study, we focus on change segmentation using raw three-dimensional (3D) point clouds (PCs) directly to avoid any information loss due to the rasterization processes. While deep learning has recently proven its effectiveness for this particular task by encoding the information through Siamese networks, we investigate herein the idea of also using change information in the early steps of deep networks. To do this, we first propose to provide a Siamese KPConv state-of-the-art (SoTA) network with hand-crafted features, especially a change-related one, which improves the mean of the Intersection over Union (IoU) over the classes of change by 4.70%. Considering that a major improvement is obtained due to the change-related feature, we then propose three new architectures to address 3D PC change segmentation: OneConvFusion, Triplet KPConv, and Encoder Fusion SiamKPConv. All these networks consider the change information in the early steps and outperform the SoTA methods. In particular, Encoder Fusion SiamKPConv overtakes the SoTA approaches by more than 5% of the mean of the IoU over the classes of change, emphasizing the value of having the network focus on change information for the change detection task. The code is available at https://github.com/IdeGelis/torch-points3d-SiamKPConvVariants.

CVJun 8, 2022Code
Learning Digital Terrain Models from Point Clouds: ALS2DTM Dataset and Rasterization-based GAN

Hoàng-Ân Lê, Florent Guiotte, Minh-Tan Pham et al.

Despite the popularity of deep neural networks in various domains, the extraction of digital terrain models (DTMs) from airborne laser scanning (ALS) point clouds is still challenging. This might be due to the lack of dedicated large-scale annotated dataset and the data-structure discrepancy between point clouds and DTMs. To promote data-driven DTM extraction, this paper collects from open sources a large-scale dataset of ALS point clouds and corresponding DTMs with various urban, forested, and mountainous scenes. A baseline method is proposed as the first attempt to train a Deep neural network to extract digital Terrain models directly from ALS point clouds via Rasterization techniques, coined DeepTerRa. Extensive studies with well-established methods are performed to benchmark the dataset and analyze the challenges in learning to extract DTM from point clouds. The experimental results show the interest of the agnostic data-driven approach, with sub-metric error level compared to methods designed for DTM extraction. The data and source code is provided at https://lhoangan.github.io/deepterra/ for reproducibility and further similar research.

CVApr 14, 2022Code
CroCo: Cross-Modal Contrastive learning for localization of Earth Observation data

Wei-Hsin Tseng, Hoàng-Ân Lê, Alexandre Boulch et al.

It is of interest to localize a ground-based LiDAR point cloud on remote sensing imagery. In this work, we tackle a subtask of this problem, i.e. to map a digital elevation model (DEM) rasterized from aerial LiDAR point cloud on the aerial imagery. We proposed a contrastive learning-based method that trains on DEM and high-resolution optical imagery and experiment the framework on different data sampling strategies and hyperparameters. In the best scenario, the Top-1 score of 0.71 and Top-5 score of 0.81 are obtained. The proposed method is promising for feature learning from RGB and DEM for localization and is potentially applicable to other data sources too. Source code will be released at https://github.com/wtseng530/AVLocalization.

CVJul 7, 2023
A Deep Active Contour Model for Delineating Glacier Calving Fronts

Konrad Heidler, Lichao Mou, Erik Loebel et al.

Choosing how to encode a real-world problem as a machine learning task is an important design decision in machine learning. The task of glacier calving front modeling has often been approached as a semantic segmentation task. Recent studies have shown that combining segmentation with edge detection can improve the accuracy of calving front detectors. Building on this observation, we completely rephrase the task as a contour tracing problem and propose a model for explicit contour detection that does not incorporate any dense predictions as intermediate steps. The proposed approach, called ``Charting Outlines by Recurrent Adaptation'' (COBRA), combines Convolutional Neural Networks (CNNs) for feature extraction and active contour models for the delineation. By training and evaluating on several large-scale datasets of Greenland's outlet glaciers, we show that this approach indeed outperforms the aforementioned methods based on segmentation and edge-detection. Finally, we demonstrate that explicit contour detection has benefits over pixel-wise methods when quantifying the models' prediction uncertainties. The project page containing the code and animated model predictions can be found at \url{https://khdlr.github.io/COBRA/}.

CVAug 31, 2024
Mapping earth mounds from space

Baki Uzun, Shivam Pande, Gwendal Cachin-Bernard et al.

Regular patterns of vegetation are considered widespread landscapes, although their global extent has never been estimated. Among them, spotted landscapes are of particular interest in the context of climate change. Indeed, regularly spaced vegetation spots in semi-arid shrublands result from extreme resource depletion and prefigure catastrophic shift of the ecosystem to a homogeneous desert, while termite mounds also producing spotted landscapes were shown to increase robustness to climate change. Yet, their identification at large scale calls for automatic methods, for instance using the popular deep learning framework, able to cope with a vast amount of remote sensing data, e.g., optical satellite imagery. In this paper, we tackle this problem and benchmark some state-of-the-art deep networks on several landscapes and geographical areas. Despite the promising results we obtained, we found that more research is needed to be able to map automatically these earth mounds from space.

CVJul 13, 2023
Multimodal Object Detection in Remote Sensing

Abdelbadie Belmouhcine, Jean-Christophe Burnel, Luc Courtrai et al.

Object detection in remote sensing is a crucial computer vision task that has seen significant advancements with deep learning techniques. However, most existing works in this area focus on the use of generic object detection and do not leverage the potential of multimodal data fusion. In this paper, we present a comparison of methods for multimodal object detection in remote sensing, survey available multimodal datasets suitable for evaluation, and discuss future directions.

CVJul 13, 2023
Weakly supervised marine animal detection from remote sensing images using vector-quantized variational autoencoder

Minh-Tan Pham, Hugo Gangloff, Sébastien Lefèvre

This paper studies a reconstruction-based approach for weakly-supervised animal detection from aerial images in marine environments. Such an approach leverages an anomaly detection framework that computes metrics directly on the input space, enhancing interpretability and anomaly localization compared to feature embedding methods. Building upon the success of Vector-Quantized Variational Autoencoders in anomaly detection on computer vision datasets, we adapt them to the marine animal detection domain and address the challenge of handling noisy data. To evaluate our approach, we compare it with existing methods in the context of marine animal detection from aerial image data. Experiments conducted on two dedicated datasets demonstrate the superior performance of the proposed method over recent studies in the literature. Our framework offers improved interpretability and localization of anomalies, providing valuable insights for monitoring marine ecosystems and mitigating the impact of human activities on marine animals.

CVApr 14, 2022
Detection of Degraded Acacia tree species using deep neural networks on uav drone imagery

Anne Achieng Osio, Hoàng-Ân Lê, Samson Ayugi et al.

Deep-learning-based image classification and object detection has been applied successfully to tree monitoring. However, studies of tree crowns and fallen trees, especially on flood inundated areas, remain largely unexplored. Detection of degraded tree trunks on natural environments such as water, mudflats, and natural vegetated areas is challenging due to the mixed colour image backgrounds. In this paper, Unmanned Aerial Vehicles (UAVs), or drones, with embedded RGB cameras were used to capture the fallen Acacia Xanthophloea trees from six designated plots around Lake Nakuru, Kenya. Motivated by the need to detect fallen trees around the lake, two well-established deep neural networks, i.e. Faster Region-based Convolution Neural Network (Faster R-CNN) and Retina-Net were used for fallen tree detection. A total of 7,590 annotations of three classes on 256 x 256 image patches were used for this study. Experimental results show the relevance of deep learning in this context, with Retina-Net model achieving 38.9% precision and 57.9% recall.

CVAug 31, 2024
Plant detection from ultra high resolution remote sensing images: A Semantic Segmentation approach based on fuzzy loss

Shivam Pande, Baki Uzun, Florent Guiotte et al.

In this study, we tackle the challenge of identifying plant species from ultra high resolution (UHR) remote sensing images. Our approach involves introducing an RGB remote sensing dataset, characterized by millimeter-level spatial resolution, meticulously curated through several field expeditions across a mountainous region in France covering various landscapes. The task of plant species identification is framed as a semantic segmentation problem for its practical and efficient implementation across vast geographical areas. However, when dealing with segmentation masks, we confront instances where distinguishing boundaries between plant species and their background is challenging. We tackle this issue by introducing a fuzzy loss within the segmentation model. Instead of utilizing one-hot encoded ground truth (GT), our model incorporates Gaussian filter refined GT, introducing stochasticity during training. First experimental results obtained on both our UHR dataset and a public dataset are presented, showing the relevance of the proposed methodology, as well as the need for future improvement.

CVMay 6, 2025Code
Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach

Pierre Adorni, Minh-Tan Pham, Stéphane May et al.

Foundation models constitute a significant advancement in computer vision: after a single, albeit costly, training phase, they can address a wide array of tasks. In the field of Earth observation, over 75 remote sensing vision foundation models have been developed in the past four years. However, none has consistently outperformed the others across all available downstream tasks. To facilitate their comparison, we propose a cost-effective method for predicting a model's performance on multiple downstream tasks without the need for fine-tuning on each one. This method is based on what we call "capabilities encoding." The utility of this novel approach is twofold: we demonstrate its potential to simplify the selection of a foundation model for a given new task, and we employ it to offer a fresh perspective on the existing literature, suggesting avenues for future research. Codes are available at https://github.com/pierreadorni/capabilities-encoding.

CVDec 17, 2025
From Words to Wavelengths: VLMs for Few-Shot Multispectral Object Detection

Manuel Nkegoum, Minh-Tan Pham, Élisa Fromont et al.

Multispectral object detection is critical for safety-sensitive applications such as autonomous driving and surveillance, where robust perception under diverse illumination conditions is essential. However, the limited availability of annotated multispectral data severely restricts the training of deep detectors. In such data-scarce scenarios, textual class information can serve as a valuable source of semantic supervision. Motivated by the recent success of Vision-Language Models (VLMs) in computer vision, we explore their potential for few-shot multispectral object detection. Specifically, we adapt two representative VLM-based detectors, Grounding DINO and YOLO-World, to handle multispectral inputs and propose an effective mechanism to integrate text, visual and thermal modalities. Through extensive experiments on two popular multispectral image benchmarks, FLIR and M3FD, we demonstrate that VLM-based detectors not only excel in few-shot regimes, significantly outperforming specialized multispectral models trained with comparable data, but also achieve competitive or superior results under fully supervised settings. Our findings reveal that the semantic priors learned by large-scale VLMs effectively transfer to unseen spectral modalities, ofFering a powerful pathway toward data-efficient multispectral perception.

57.9LGMay 12
NOFE -- Neural Operator Function Embedding

Lars Uebbing, Harald L. Joakimsen, Siyan Chen et al.

Most dimensionality reduction methods treat data as discrete point clouds, ignoring the continuous domain structure inherent to many real-world processes. To bridge this gap, we introduce Neural Operator Function Embedding (NOFE), a domain-aware framework for continuous dimensionality reduction. NOFE learns function-to-function mappings via a Graph Kernel Operator, enabling mesh-free evaluation at arbitrary query locations independent of input discretization. We establish NOFE as approximation of sheaf-to-sheaf mappings, generalizing Sheaf Neural Networks to continuous domains. We evaluate NOFE across different datasets, comparing it against PCA, t-SNE, and UMAP. Our results demonstrate that NOFE significantly outperforms baselines in local structure preservation, achieving a local Stress of 0.111 compared to 0.398 for PCA, 0.773 for t-SNE, and 0.791 for UMAP for the ERA5 climate reanalysis dataset. NOFE also exhibits robust sampling independence, reducing the Patch Stitching Error by up to $20.0\times$ relative to UMAP (59.0 vs. 267.6 under regional normalization) and ensuring consistency across disjoint domain patches. While maintaining competitive global structure preservation (Stress-1: 0.379 vs. PCA's 0.268), NOFE resolves fine-grained structures and produces smooth, consistent embeddings that generalize across varying sample densities, addressing key limitations of discrete reduction methods.

CVNov 26, 2025Code
EoS-FM: Can an Ensemble of Specialist Models act as a Generalist Feature Extractor?

Pierre Adorni, Minh-Tan Pham, Stéphane May et al.

Recent advances in foundation models have shown great promise in domains such as natural language processing and computer vision, and similar efforts are now emerging in the Earth Observation community. These models aim to generalize across tasks with limited supervision, reducing the need for training separate models for each task. However, current strategies, which largely focus on scaling model size and dataset volume, require prohibitive computational and data resources, limiting accessibility to only a few large institutions. Moreover, this paradigm of ever-larger models stands in stark contrast with the principles of sustainable and environmentally responsible AI, as it leads to immense carbon footprints and resource inefficiency. In this work, we present a novel and efficient alternative: an Ensemble-of-Specialists framework for building Remote Sensing Foundation Models (RSFMs). Our method decomposes the training process into lightweight, task-specific ConvNeXtV2 specialists that can be frozen and reused. This modular approach offers strong advantages in efficiency, interpretability, and extensibility. Moreover, it naturally supports federated training, pruning, and continuous specialist integration, making it particularly well-suited for collaborative and resource-constrained settings. Our framework sets a new direction for building scalable and efficient RSFMs. All codes and pretrained models are available at https://github.com/pierreadorni/EoS-FM.

LGMay 28, 2019Code
BreizhCrops: A Time Series Dataset for Crop Type Mapping

Marc Rußwurm, Charlotte Pelletier, Maximilian Zollner et al.

We present Breizhcrops, a novel benchmark dataset for the supervised classification of field crops from satellite time series. We aggregated label data and Sentinel-2 top-of-atmosphere as well as bottom-of-atmosphere time series in the region of Brittany (Breizh in local language), north-east France. We compare seven recently proposed deep neural networks along with a Random Forest baseline. The dataset, model (re-)implementations and pre-trained model weights are available at the associated GitHub repository (https://github.com/dl4sits/BreizhCrops) that has been designed with applicability for practitioners in mind. We plan to maintain the repository with additional data and welcome contributions of novel methods to build a state-of-the-art benchmark on methods for crop type mapping.

LGJan 30, 2019Code
End-to-End Learned Early Classification of Time Series for In-Season Crop Type Mapping

Marc Rußwurm, Nicolas Courty, Rémi Emonet et al.

Remote sensing satellites capture the cyclic dynamics of our Planet in regular time intervals recorded in satellite time series data. End-to-end trained deep learning models use this time series data to make predictions at a large scale, for instance, to produce up-to-date crop cover maps. Most time series classification approaches focus on the accuracy of predictions. However, the earliness of the prediction is also of great importance since coming to an early decision can make a crucial difference in time-sensitive applications. In this work, we present an End-to-End Learned Early Classification of Time Series (ELECTS) model that estimates a classification score and a probability of whether sufficient data has been observed to come to an early and still accurate decision. ELECTS is modular: any deep time series classification model can adopt the ELECTS conceptual idea by adding a second prediction head that outputs a probability of stopping the classification. The ELECTS loss function then optimizes the overall model on a balanced objective of earliness and accuracy. Our experiments on four crop classification datasets from Europe and Africa show that ELECTS allows reaching state-of-the-art accuracy while reducing the quantity of data massively to be downloaded, stored, and processed. The source code is available at https://github.com/marccoru/elects.

CVSep 25, 2025
FSMODNet: A Closer Look at Few-Shot Detection in Multispectral Data

Manuel Nkegoum, Minh-Tan Pham, Élisa Fromont et al.

Few-shot multispectral object detection (FSMOD) addresses the challenge of detecting objects across visible and thermal modalities with minimal annotated data. In this paper, we explore this complex task and introduce a framework named "FSMODNet" that leverages cross-modality feature integration to improve detection performance even with limited labels. By effectively combining the unique strengths of visible and thermal imagery using deformable attention, the proposed method demonstrates robust adaptability in complex illumination and environmental conditions. Experimental results on two public datasets show effective object detection performance in challenging low-data regimes, outperforming several baselines we established from state-of-the-art models. All code, models, and experimental data splits can be found at https://anonymous.4open.science/r/Test-B48D.

CVJun 17, 2025
Earth Observation Foundation Model PhilEO: Pretraining on the MajorTOM and FastTOM Datasets

Nikolaos Dionelis, Riccardo Musto, Jente Bosmans et al.

Today, Earth Observation (EO) satellites generate massive volumes of data. To fully exploit this, it is essential to pretrain EO Foundation Models (FMs) on large unlabeled datasets, enabling efficient fine-tuning for downstream tasks with minimal labeled data. In this paper, we study scaling-up FMs: we train our models on the pretraining dataset MajorTOM 23TB which includes all regions, and the performance on average is competitive versus models pretrained on more specialized datasets which are substantially smaller and include only land. The additional data of oceans and ice do not decrease the performance on land-focused downstream tasks. These results indicate that large FMs trained on global datasets for a wider variety of downstream tasks can be useful for downstream applications that only require a subset of the information included in their training. The second contribution is the exploration of U-Net Convolutional Neural Network (CNN), Vision Transformers (ViT), and Mamba State-Space Models (SSM) as FMs. U-Net captures local correlations amongst pixels, while ViT and Mamba capture local and distant correlations. We develop various models using different architectures, including U-Net, ViT, and Mamba, and different number of parameters. We evaluate the FLoating-point OPerations (FLOPs) needed by the models. We fine-tune on the PhilEO Bench for different downstream tasks: roads, buildings, and land cover. For most n-shots for roads and buildings, U-Net 200M-2T outperforms the other models. Using Mamba, we achieve comparable results on the downstream tasks, with less computational expenses. We also compare with the recent FM TerraMind which we evaluate on PhilEO Bench.

CVMay 22, 2025
On the use of Graphs for Satellite Image Time Series

Corentin Dufourg, Charlotte Pelletier, Stéphane May et al.

The Earth's surface is subject to complex and dynamic processes, ranging from large-scale phenomena such as tectonic plate movements to localized changes associated with ecosystems, agriculture, or human activity. Satellite images enable global monitoring of these processes with extensive spatial and temporal coverage, offering advantages over in-situ methods. In particular, resulting satellite image time series (SITS) datasets contain valuable information. To handle their large volume and complexity, some recent works focus on the use of graph-based techniques that abandon the regular Euclidean structure of satellite data to work at an object level. Besides, graphs enable modelling spatial and temporal interactions between identified objects, which are crucial for pattern detection, classification and regression tasks. This paper is an effort to examine the integration of graph-based methods in spatio-temporal remote-sensing analysis. In particular, it aims to present a versatile graph-based pipeline to tackle SITS analysis. It focuses on the construction of spatio-temporal graphs from SITS and their application to downstream tasks. The paper includes a comprehensive review and two case studies, which highlight the potential of graph-based approaches for land cover mapping and water resource forecasting. It also discusses numerous perspectives to resolve current limitations and encourage future developments.

CVMay 9, 2023
DC3DCD: unsupervised learning for multiclass 3D point cloud change detection

Iris de Gélis, Sébastien Lefèvre, Thomas Corpetti

In a constant evolving world, change detection is of prime importance to keep updated maps. To better sense areas with complex geometry (urban areas in particular), considering 3D data appears to be an interesting alternative to classical 2D images. In this context, 3D point clouds (PCs), whether obtained through LiDAR or photogrammetric techniques, provide valuable information. While recent studies showed the considerable benefit of using deep learning-based methods to detect and characterize changes into raw 3D PCs, these studies rely on large annotated training data to obtain accurate results. The collection of these annotations are tricky and time-consuming. The availability of unsupervised or weakly supervised approaches is then of prime interest. In this paper, we propose an unsupervised method, called DeepCluster 3D Change Detection (DC3DCD), to detect and categorize multiclass changes at point level. We classify our approach in the unsupervised family given the fact that we extract in a completely unsupervised way a number of clusters associated with potential changes. Let us precise that in the end of the process, the user has only to assign a label to each of these clusters to derive the final change map. Our method builds upon the DeepCluster approach, originally designed for image classification, to handle complex raw 3D PCs and perform change segmentation task. An assessment of the method on both simulated and real public dataset is provided. The proposed method allows to outperform fully-supervised traditional machine learning algorithm and to be competitive with fully-supervised deep learning networks applied on rasterization of 3D PCs with a mean of IoU over classes of change of 57.06\% and 66.69\% for the simulated and the real datasets, respectively.

CVNov 4, 2021
TimeMatch: Unsupervised Cross-Region Adaptation by Temporal Shift Estimation

Joachim Nyborg, Charlotte Pelletier, Sébastien Lefèvre et al.

The recent developments of deep learning models that capture complex temporal patterns of crop phenology have greatly advanced crop classification from Satellite Image Time Series (SITS). However, when applied to target regions spatially different from the training region, these models perform poorly without any target labels due to the temporal shift of crop phenology between regions. Although various unsupervised domain adaptation techniques have been proposed in recent years, no method explicitly learns the temporal shift of SITS and thus provides only limited benefits for crop classification. To address this, we propose TimeMatch, which explicitly accounts for the temporal shift for improved SITS-based domain adaptation. In TimeMatch, we first estimate the temporal shift from the target to the source region using the predictions of a source-trained model. Then, we re-train the model for the target region by an iterative algorithm where the estimated shift is used to generate accurate target pseudo-labels. Additionally, we introduce an open-access dataset for cross-region adaptation from SITS in four different regions in Europe. On our dataset, we demonstrate that TimeMatch outperforms all competing methods by 11% in average F1-score across five different adaptation scenarios, setting a new state-of-the-art in cross-region adaptation.

CVOct 15, 2020
Semi-Supervised Semantic Segmentation in Earth Observation: The MiniFrance Suite, Dataset Analysis and Multi-task Network Study

Javiera Castillo-Navarro, Bertrand Le Saux, Alexandre Boulch et al.

The development of semi-supervised learning techniques is essential to enhance the generalization capacities of machine learning algorithms. Indeed, raw image data are abundant while labels are scarce, therefore it is crucial to leverage unlabeled inputs to build better models. The availability of large databases have been key for the development of learning algorithms with high level performance. Despite the major role of machine learning in Earth Observation to derive products such as land cover maps, datasets in the field are still limited, either because of modest surface coverage, lack of variety of scenes or restricted classes to identify. We introduce a novel large-scale dataset for semi-supervised semantic segmentation in Earth Observation, the MiniFrance suite. MiniFrance has several unprecedented properties: it is large-scale, containing over 2000 very high resolution aerial images, accounting for more than 200 billions samples (pixels); it is varied, covering 16 conurbations in France, with various climates, different landscapes, and urban as well as countryside scenes; and it is challenging, considering land use classes with high-level semantics. Nevertheless, the most distinctive quality of MiniFrance is being the only dataset in the field especially designed for semi-supervised learning: it contains labeled and unlabeled images in its training partition, which reproduces a life-like scenario. Along with this dataset, we present tools for data representativeness analysis in terms of appearance similarity and a thorough study of MiniFrance data, demonstrating that it is suitable for learning and generalizes well in a semi-supervised setting. Finally, we present semi-supervised deep architectures based on multi-task learning and the first experiments on MiniFrance.

CVMar 23, 2020
GeoGraph: Learning graph-based multi-view object detection with geometric cues end-to-end

Ahmed Samy Nassar, Stefano D'Aronco, Sébastien Lefèvre et al.

In this paper we propose an end-to-end learnable approach that detects static urban objects from multiple views, re-identifies instances, and finally assigns a geographic position per object. Our method relies on a Graph Neural Network (GNN) to, detect all objects and output their geographic positions given images and approximate camera poses as input. Our GNN simultaneously models relative pose and image evidence, and is further able to deal with an arbitrary number of input views. Our method is robust to occlusion, with similar appearance of neighboring objects, and severe changes in viewpoints by jointly reasoning about visual image appearance and relative pose. Experimental evaluation on two challenging, large-scale datasets and comparison with state-of-the-art methods show significant and systematic improvements both in accuracy and efficiency, with 2-6% gain in detection and re-ID average precision as well as 8x reduction of training time.

CVOct 31, 2019
Very high resolution Airborne PolSAR Image Classification using Convolutional Neural Networks

Minh-Tan Pham, Sébastien Lefèvre

In this work, we exploit convolutional neural networks (CNNs) for the classification of very high resolution (VHR) polarimetric SAR (PolSAR) data. Due to the significant appearance of heterogeneous textures within these data, not only polarimetric features but also structural tensors are exploited to feed CNN models. For deep networks, we use the SegNet model for semantic segmentation, which corresponds to pixelwise classification in remote sensing. Our experiments on the airborne F-SAR data show that for VHR PolSAR images, SegNet could provide high accuracy for the classification task; and introducing structural tensors together with polarimetric features as inputs could help the network to focus more on geometrical information to significantly improve the classification performance.

CVOct 22, 2019
Vehicle detection and counting from VHR satellite images: efforts and open issues

Alice Froidevaux, Andréa Julier, Agustin Lifschitz et al.

Detection of new infrastructures (commercial, logistics, industrial or residential) from satellite images constitutes a proven method to investigate and follow economic and urban growth. The level of activities or exploitation of these sites may be hardly determined by building inspection, but could be inferred from vehicle presence from nearby streets and parking lots. We present in this paper two deep learning-based models for vehicle counting from optical satellite images coming from the Pleiades sensor at 50-cm spatial resolution. Both segmentation (Tiramisu) and detection (YOLO) architectures were investigated. These networks were adapted, trained and validated on a data set including 87k vehicles, annotated using an interactive semi-automatic tool developed by the authors. Experimental results show that both segmentation and detection models could achieve a precision rate higher than 85% with a recall rate also high (76.4% and 71.9% for Tiramisu and YOLO respectively).

NESep 4, 2019
Distance transform regression for spatially-aware deep semantic segmentation

Nicolas Audebert, Alexandre Boulch, Bertrand Le Saux et al.

Understanding visual scenes relies more and more on dense pixel-wise classification obtained via deep fully convolutional neural networks. However, due to the nature of the networks, predictions often suffer from blurry boundaries and ill-segmented shapes, fueling the need for post-processing. This work introduces a new semantic segmentation regularization based on the regression of a distance transform. After computing the distance transform on the label masks, we train a FCN in a multi-task setting in both discrete and continuous spaces by learning jointly classification and distance regression. This requires almost no modification of the network structure and adds a very low overhead to the training process. Learning to approximate the distance transform back-propagates spatial cues that implicitly regularizes the segmentation. We validate this technique with several architectures on various datasets, and we show significant improvements compared to competitive baselines.

LGAug 27, 2019
Early Classification for Agricultural Monitoring from Satellite Time Series

Marc Rußwurm, Romain Tavenard, Sébastien Lefèvre et al.

In this work, we introduce a recently developed early classification mechanism to satellite-based agricultural monitoring. It augments existing classification models by an additional stopping probability based on the previously seen information. This mechanism is end-to-end trainable and derives its stopping decision solely from the observed satellite data. We show results on field parcels in central Europe where sufficient ground truth data is available for an empiric evaluation of the results with local phenological information obtained from authorities. We observe that the recurrent neural network outfitted with this early classification mechanism was able to distinguish the many of the crop types before the end of the vegetative period. Further, we associated these stopping times with evaluated ground truth information and saw that the times of classification were related to characteristic events of the observed plants' phenology.

LGApr 24, 2019
Deep Learning for Classification of Hyperspectral Data: A Comparative Review

Nicolas Audebert, Bertrand Saux, Sébastien Lefèvre

In recent years, deep learning techniques revolutionized the way remote sensing data are processed. Classification of hyperspectral data is no exception to the rule, but has intrinsic specificities which make application of deep learning less straightforward than with other optical data. This article presents a state of the art of previous machine learning approaches, reviews the various deep learning approaches currently proposed for hyperspectral classification, and identifies the problems and difficulties which arise to implement deep neural networks for this task. In particular, the issues of spatial and spectral resolution, data volume, and transfer of models from multimedia images to hyperspectral data are addressed. Additionally, a comparative study of various families of network architectures is provided and a software toolbox is publicly released to allow experimenting with these methods. 1 This article is intended for both data scientists with interest in hyperspectral data and remote sensing experts eager to apply deep learning techniques to their own dataset.

CVJun 18, 2018
Classification of remote sensing images using attribute profiles and feature profiles from different trees: a comparative study

Minh-Tan Pham, Erchan Aptoula, Sébastien Lefèvre

The motivation of this paper is to conduct a comparative study on remote sensing image classification using the morphological attribute profiles (APs) and feature profiles (FPs) generated from different types of tree structures. Over the past few years, APs have been among the most effective methods to model the image's spatial and contextual information. Recently, a novel extension of APs called FPs has been proposed by replacing pixel gray-levels with some statistical and geometrical features when forming the output profiles. FPs have been proved to be more efficient than the standard APs when generated from component trees (max-tree and min-tree). In this work, we investigate their performance on the inclusion tree (tree of shapes) and partition trees (alpha tree and omega tree). Experimental results from both panchromatic and hyperspectral images again confirm the efficiency of FPs compared to APs.

NEJun 7, 2018
Generative Adversarial Networks for Realistic Synthesis of Hyperspectral Samples

Nicolas Audebert, Bertrand Le Saux, Sébastien Lefèvre

This work addresses the scarcity of annotated hyperspectral data required to train deep neural networks. Especially, we investigate generative adversarial networks and their application to the synthesis of consistent labeled spectra. By training such networks on public datasets, we show that these models are not only able to capture the underlying distribution, but also to generate genuine-looking and physically plausible spectra. Moreover, we experimentally validate that the synthetic samples can be used as an effective data augmentation strategy. We validate our approach on several public hyper-spectral datasets using a variety of deep classifiers.

CVMar 27, 2018
Recent Developments from Attribute Profiles for Remote Sensing Image Classification

Minh-Tan Pham, Sébastien Lefèvre, Erchan Aptoula et al.

Morphological attribute profiles (APs) are among the most effective methods to model the spatial and contextual information for the analysis of remote sensing images, especially for classification task. Since their first introduction to this field in early 2010's, many research studies have been contributed not only to exploit and adapt their use to different applications, but also to extend and improve their performance for better dealing with more complex data. In this paper, we revisit and discuss different developments and extensions from APs which have drawn significant attention from researchers in the past few years. These studies are analyzed and gathered based on the concept of multi-stage AP construction. In our experiments, a comparative study on classification results of two remote sensing data is provided in order to show their significant improvements compared to the originally proposed APs.

CVMar 22, 2018
Buried object detection from B-scan ground penetrating radar data using Faster-RCNN

Minh-Tan Pham, Sébastien Lefèvre

In this paper, we adapt the Faster-RCNN framework for the detection of underground buried objects (i.e. hyperbola reflections) in B-scan ground penetrating radar (GPR) images. Due to the lack of real data for training, we propose to incorporate more simulated radargrams generated from different configurations using the gprMax toolbox. Our designed CNN is first pre-trained on the grayscale Cifar-10 database. Then, the Faster-RCNN framework based on the pre-trained CNN is trained and fine-tuned on both real and simulated GPR data. Preliminary detection results show that the proposed technique can provide significant improvements compared to classical computer vision methods and hence becomes quite promising to deal with this kind of specific GPR data even with few training samples.

NENov 23, 2017
Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks

Nicolas Audebert, Bertrand Le Saux, Sébastien Lefèvre

In this work, we investigate various methods to deal with semantic labeling of very high resolution multi-modal remote sensing data. Especially, we study how deep fully convolutional networks can be adapted to deal with multi-modal and multi-scale remote sensing data for semantic labeling. Our contributions are threefold: a) we present an efficient multi-scale approach to leverage both a large spatial context and the high resolution data, b) we investigate early and late fusion of Lidar and multispectral data, c) we validate our methods on two public datasets with state-of-the-art results. Our results indicate that late fusion make it possible to recover errors steaming from ambiguous data, while early fusion allows for better joint-feature learning but at the cost of higher sensitivity to missing data.

CVMay 23, 2017
Towards seamless multi-view scene analysis from satellite to street-level

Sébastien Lefèvre, Devis Tuia, Jan Dirk Wegner et al.

In this paper, we discuss and review how combined multi-view imagery from satellite to street-level can benefit scene analysis. Numerous works exist that merge information from remote sensing and images acquired from the ground for tasks like land cover mapping, object detection, or scene understanding. What makes the combination of overhead and street-level images challenging, is the strongly varying viewpoint, different scale, illumination, sensor modality and time of acquisition. Direct (dense) matching of images on a per-pixel basis is thus often impossible, and one has to resort to alternative strategies that will be discussed in this paper. We review recent works that attempt to combine images taken from the ground and overhead views for purposes like scene registration, reconstruction, or classification. Three methods that represent the wide range of potential methods and applications (change detection, image orientation, and tree cataloging) are described in detail. We show that cross-fertilization between remote sensing, computer vision and machine learning is very valuable to make the best of geographic data available from Earth Observation sensors and ground imagery. Despite its challenges, we believe that integrating these complementary data sources will lead to major breakthroughs in Big GeoData.

CVMay 17, 2017
Joint Learning from Earth Observation and OpenStreetMap Data to Get Faster Better Semantic Maps

Nicolas Audebert, Bertrand Le Saux, Sébastien Lefèvre

In this work, we investigate the use of OpenStreetMap data for semantic labeling of Earth Observation images. Deep neural networks have been used in the past for remote sensing data classification from various sensors, including multispectral, hyperspectral, SAR and LiDAR data. While OpenStreetMap has already been used as ground truth data for training such networks, this abundant data source remains rarely exploited as an input information layer. In this paper, we study different use cases and deep network architectures to leverage OpenStreetMap data for semantic labeling of aerial and satellite images. Especially , we look into fusion based architectures and coarse-to-fine segmentation to include the OpenStreetMap layer into multispectral-based deep fully convolutional networks. We illustrate how these methods can be successfully used on two public datasets: ISPRS Potsdam and DFC2017. We show that OpenStreetMap data can efficiently be integrated into the vision-based deep learning models and that it significantly improves both the accuracy performance and the convergence speed of the networks.

NEJan 20, 2017
Fusion of Heterogeneous Data in Convolutional Networks for Urban Semantic Labeling (Invited Paper)

Nicolas Audebert, Bertrand Le Saux, Sébastien Lefèvre

In this work, we present a novel module to perform fusion of heterogeneous data using fully convolutional networks for semantic labeling. We introduce residual correction as a way to learn how to fuse predictions coming out of a dual stream architecture. Especially, we perform fusion of DSM and IRRG optical data on the ISPRS Vaihingen dataset over a urban area and obtain new state-of-the-art results.

CVSep 22, 2016
How Useful is Region-based Classification of Remote Sensing Images in a Deep Learning Framework?

Nicolas Audebert, Bertrand Le Saux, Sébastien Lefèvre

In this paper, we investigate the impact of segmentation algorithms as a preprocessing step for classification of remote sensing images in a deep learning framework. Especially, we address the issue of segmenting the image into regions to be classified using pre-trained deep neural networks as feature extractors for an SVM-based classifier. An efficient segmentation as a preprocessing step helps learning by adding a spatially-coherent structure to the data. Therefore, we compare algorithms producing superpixels with more traditional remote sensing segmentation algorithms and measure the variation in terms of classification accuracy. We establish that superpixel algorithms allow for a better classification accuracy as a homogenous and compact segmentation favors better generalization of the training samples.

CVSep 22, 2016
Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks

Nicolas Audebert, Bertrand Le Saux, Sébastien Lefèvre

This work investigates the use of deep fully convolutional neural networks (DFCNN) for pixel-wise scene labeling of Earth Observation images. Especially, we train a variant of the SegNet architecture on remote sensing data over an urban area and study different strategies for performing accurate semantic segmentation. Our contributions are the following: 1) we transfer efficiently a DFCNN from generic everyday images to remote sensing images; 2) we introduce a multi-kernel convolutional layer for fast aggregation of predictions at multiple scales; 3) we perform data fusion from heterogeneous sensors (optical and laser) using residual correction. Our framework improves state-of-the-art accuracy on the ISPRS Vaihingen 2D Semantic Labeling dataset.

NESep 22, 2016
On the usability of deep networks for object-based image analysis

Nicolas Audebert, Bertrand Le Saux, Sébastien Lefèvre

As computer vision before, remote sensing has been radically changed by the introduction of Convolution Neural Networks. Land cover use, object detection and scene understanding in aerial images rely more and more on deep learning to achieve new state-of-the-art results. Recent architectures such as Fully Convolutional Networks (Long et al., 2015) can even produce pixel level annotations for semantic mapping. In this work, we show how to use such deep networks to detect, segment and classify different varieties of wheeled vehicles in aerial images from the ISPRS Potsdam dataset. This allows us to tackle object detection and classification on a complex dataset made up of visually similar classes, and to demonstrate the relevance of such a subclass modeling approach. Especially, we want to show that deep learning is also suitable for object-oriented analysis of Earth Observation data. First, we train a FCN variant on the ISPRS Potsdam dataset and show how the learnt semantic maps can be used to extract precise segmentation of vehicles, which allow us studying the repartition of vehicles in the city. Second, we train a CNN to perform vehicle classification on the VEDAI (Razakarivony and Jurie, 2016) dataset, and transfer its knowledge to classify candidate segmented vehicles on the Potsdam dataset.

CVJun 15, 2016
Combining multiscale features for classification of hyperspectral images: a sequence based kernel approach

Yanwei Cui, Laetitia Chapel, Sébastien Lefèvre

Nowadays, hyperspectral image classification widely copes with spatial information to improve accuracy. One of the most popular way to integrate such information is to extract hierarchical features from a multiscale segmentation. In the classification context, the extracted features are commonly concatenated into a long vector (also called stacked vector), on which is applied a conventional vector-based machine learning technique (e.g. SVM with Gaussian kernel). In this paper, we rather propose to use a sequence structured kernel: the spectrum kernel. We show that the conventional stacked vector-based kernel is actually a special case of this kernel. Experiments conducted on various publicly available hyperspectral datasets illustrate the improvement of the proposed kernel w.r.t. conventional ones using the same hierarchical spatial features.

CVApr 6, 2016
A Subpath Kernel for Learning Hierarchical Image Representations

Yanwei Cui, Laetitia Chapel, Sébastien Lefèvre

Tree kernels have demonstrated their ability to deal with hierarchical data, as the intrinsic tree structure often plays a discriminative role. While such kernels have been successfully applied to various domains such as nature language processing and bioinformatics, they mostly concentrate on ordered trees and whose nodes are described by symbolic data. Meanwhile, hierarchical representations have gained increasing interest to describe image content. This is particularly true in remote sensing, where such representations allow for revealing different objects of interest at various scales through a tree structure. However, the induced trees are unordered and the nodes are equipped with numerical features. In this paper, we propose a new structured kernel for hierarchical image representations which is built on the concept of subpath kernel. Experimental results on both artificial and remote sensing datasets show that the proposed kernel manages to deal with the hierarchical nature of the data, leading to better classification rates.