Nikolaos Dionelis

CV
h-index30
17papers
214citations
Novelty42%
AI Score40

17 Papers

CVApr 15, 2025Code
TerraMind: Large-Scale Generative Multimodality for Earth Observation

Johannes Jakubik, Felix Yang, Benedikt Blumenstiel et al.

We present TerraMind, the first any-to-any generative, multimodal foundation model for Earth observation (EO). Unlike other multimodal models, TerraMind is pretrained on dual-scale representations combining both token-level and pixel-level data across modalities. On a token level, TerraMind encodes high-level contextual information to learn cross-modal relationships, while on a pixel level, TerraMind leverages fine-grained representations to capture critical spatial nuances. We pretrained TerraMind on nine geospatial modalities of a global, large-scale dataset. In this paper, we demonstrate that (i) TerraMind's dual-scale early fusion approach unlocks a range of zero-shot and few-shot applications for Earth observation, (ii) TerraMind introduces "Thinking-in-Modalities" (TiM) -- the capability of generating additional artificial data during finetuning and inference to improve the model output -- and (iii) TerraMind achieves beyond state-of-the-art performance in community-standard benchmarks for EO like PANGAEA. The pretraining dataset, the model weights, and our code are open-sourced under a permissive license.

GRJan 30
HeatMat: Simulation of City Material Impact on Urban Heat Island Effect

Marie Reinbigler, Romain Rouffet, Peter Naylor et al.

The Urban Heat Island (UHI) effect, defined as a significant increase in temperature in urban environments compared to surrounding areas, is difficult to study in real cities using sensor data (satellites or in-situ stations) due to their coarse spatial and temporal resolution. Among the factors contributing to this effect are the properties of urban materials, which differ from those in rural areas. To analyze their individual impact and to test new material configurations, a high-resolution simulation at the city scale is required. Estimating the current materials used in a city, including those on building facades, is also challenging. We propose HeatMat, an approach to analyze at high resolution the individual impact of urban materials on the UHI effect in a real city, relying only on open data. We estimate building materials using street-view images and a pre-trained vision-language model (VLM) to supplement existing OpenStreetMap data, which describes the 2D geometry and features of buildings. We further encode this information into a set of 2D maps that represent the city's vertical structure and material characteristics. These maps serve as inputs for our 2.5D simulator, which models coupled heat transfers and enables random-access surface temperature estimation at multiple resolutions, reaching an x20 speedup compared to an equivalent simulation in 3D.

CVJan 9, 2024
PhilEO Bench: Evaluating Geo-Spatial Foundation Models

Casper Fibaek, Luke Camilleri, Andreas Luyts et al.

Massive amounts of unlabelled data are captured by Earth Observation (EO) satellites, with the Sentinel-2 constellation generating 1.6 TB of data daily. This makes Remote Sensing a data-rich domain well suited to Machine Learning (ML) solutions. However, a bottleneck in applying ML models to EO is the lack of annotated data as annotation is a labour-intensive and costly process. As a result, research in this domain has focused on Self-Supervised Learning and Foundation Model approaches. This paper addresses the need to evaluate different Foundation Models on a fair and uniform benchmark by introducing the PhilEO Bench, a novel evaluation framework for EO Foundation Models. The framework comprises of a testbed and a novel 400 GB Sentinel-2 dataset containing labels for three downstream tasks, building density estimation, road segmentation, and land cover classification. We present experiments using our framework evaluating different Foundation Models, including Prithvi and SatMAE, at multiple n-shots and convergence rates.

CVApr 17, 2024
A Semantic Segmentation-guided Approach for Ground-to-Aerial Image Matching

Francesco Pro, Nikolaos Dionelis, Luca Maiano et al.

Nowadays the accurate geo-localization of ground-view images has an important role across domains as diverse as journalism, forensics analysis, transports, and Earth Observation. This work addresses the problem of matching a query ground-view image with the corresponding satellite image without GPS data. This is done by comparing the features from a ground-view image and a satellite one, innovatively leveraging the corresponding latter's segmentation mask through a three-stream Siamese-like network. The proposed method, Semantic Align Net (SAN), focuses on limited Field-of-View (FoV) and ground panorama images (images with a FoV of 360°). The novelty lies in the fusion of satellite images in combination with their semantic segmentation masks, aimed at ensuring that the model can extract useful features and focus on the significant parts of the images. This work shows how SAN through semantic analysis of images improves the performance on the unlabelled CVUSA dataset for all the tested FoVs.

CVApr 17, 2024
Learning from Unlabelled Data with Transformers: Domain Adaptation for Semantic Segmentation of High Resolution Aerial Images

Nikolaos Dionelis, Francesco Pro, Luca Maiano et al.

Data from satellites or aerial vehicles are most of the times unlabelled. Annotating such data accurately is difficult, requires expertise, and is costly in terms of time. Even if Earth Observation (EO) data were correctly labelled, labels might change over time. Learning from unlabelled data within a semi-supervised learning framework for segmentation of aerial images is challenging. In this paper, we develop a new model for semantic segmentation of unlabelled images, the Non-annotated Earth Observation Semantic Segmentation (NEOS) model. NEOS performs domain adaptation as the target domain does not have ground truth semantic segmentation masks. The distribution inconsistencies between the target and source domains are due to differences in acquisition scenes, environment conditions, sensors, and times. Our model aligns the learned representations of the different domains to make them coincide. The evaluation results show that NEOS is successful and outperforms other models for semantic segmentation of unlabelled data.

CVFeb 19, 2025
Building Age Estimation: A New Multi-Modal Benchmark Dataset and Community Challenge

Nikolaos Dionelis, Alessandra Feliciotti, Mattia Marconcini et al.

Estimating the construction year of buildings is critical for advancing sustainability, as older structures often lack energy-efficient features. Sustainable urban planning relies on accurate building age data to reduce energy consumption and mitigate climate change. In this work, we introduce MapYourCity, a novel multi-modal benchmark dataset comprising top-view Very High Resolution (VHR) imagery, multi-spectral Earth Observation (EO) data from the Copernicus Sentinel-2 satellite constellation, and co-localized street-view images across various European cities. Each building is labeled with its construction epoch, and the task is formulated as a seven-class classification problem covering periods from 1900 to the present. To advance research in EO generalization and multi-modal learning, we organized a community-driven data challenge in 2024, hosted by ESA $Φ$-lab, which ran for four months and attracted wide participation. This paper presents the Top-4 performing models from the challenge and their evaluation results. We assess model generalization on cities excluded from training to prevent data leakage, and evaluate performance under missing modality scenarios, particularly when street-view data is unavailable. Results demonstrate that building age estimation is both feasible and effective, even in previously unseen cities and when relying solely on top-view satellite imagery (i.e. with VHR and Sentinel-2 images). The MapYourCity dataset thus provides a valuable resource for developing scalable, real-world solutions in sustainable urban analytics.

CVJun 17, 2025
Earth Observation Foundation Model PhilEO: Pretraining on the MajorTOM and FastTOM Datasets

Nikolaos Dionelis, Riccardo Musto, Jente Bosmans et al.

Today, Earth Observation (EO) satellites generate massive volumes of data. To fully exploit this, it is essential to pretrain EO Foundation Models (FMs) on large unlabeled datasets, enabling efficient fine-tuning for downstream tasks with minimal labeled data. In this paper, we study scaling-up FMs: we train our models on the pretraining dataset MajorTOM 23TB which includes all regions, and the performance on average is competitive versus models pretrained on more specialized datasets which are substantially smaller and include only land. The additional data of oceans and ice do not decrease the performance on land-focused downstream tasks. These results indicate that large FMs trained on global datasets for a wider variety of downstream tasks can be useful for downstream applications that only require a subset of the information included in their training. The second contribution is the exploration of U-Net Convolutional Neural Network (CNN), Vision Transformers (ViT), and Mamba State-Space Models (SSM) as FMs. U-Net captures local correlations amongst pixels, while ViT and Mamba capture local and distant correlations. We develop various models using different architectures, including U-Net, ViT, and Mamba, and different number of parameters. We evaluate the FLoating-point OPerations (FLOPs) needed by the models. We fine-tune on the PhilEO Bench for different downstream tasks: roads, buildings, and land cover. For most n-shots for roads and buildings, U-Net 200M-2T outperforms the other models. Using Mamba, we achieve comparable results on the downstream tasks, with less computational expenses. We also compare with the recent FM TerraMind which we evaluate on PhilEO Bench.

CVFeb 19, 2025
CARE: Confidence-Aware Regression Estimation of building density fine-tuning EO Foundation Models

Nikolaos Dionelis, Jente Bosmans, Nicolas Longépé

Performing accurate confidence quantification and assessment in pixel-wise regression tasks, which are downstream applications of AI Foundation Models for Earth Observation (EO), is important for deep neural networks to predict their failures, improve their performance and enhance their capabilities in real-world applications, for their practical deployment. For pixel-wise regression tasks, specifically utilizing remote sensing data from satellite imagery in EO Foundation Models, confidence quantification is a critical challenge. The focus of this research work is on developing a Foundation Model using EO satellite data that computes and assigns a confidence metric alongside regression outputs to improve the reliability and interpretability of predictions generated by deep neural networks. To this end, we develop, train and evaluate the proposed Confidence-Aware Regression Estimation (CARE) Foundation Model. Our model CARE computes and assigns confidence to regression results as downstream tasks of a Foundation Model for EO data, and performs a confidence-aware self-corrective learning method for the low-confidence regions. We evaluate the model CARE, and experimental results on multi-spectral data from the Copernicus Sentinel-2 satellite constellation to estimate the building density (i.e. monitoring urban growth), show that the proposed method can be successfully applied to important regression problems in EO and remote sensing. We also show that our model CARE outperforms other baseline methods.

CVJun 26, 2024
Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI

Nikolaos Dionelis, Casper Fibaek, Luke Camilleri et al.

When we are primarily interested in solving several problems jointly with a given prescribed high performance accuracy for each target application, then Foundation Models should for most cases be used rather than problem-specific models. We focus on the specific Computer Vision application of Foundation Models for Earth Observation (EO) and geospatial AI. These models can solve important problems we are tackling, including for example land cover classification, crop type mapping, flood segmentation, building density estimation, and road regression segmentation. In this paper, we show that for a limited number of labelled data, Foundation Models achieve improved performance compared to problem-specific models. In this work, we also present our proposed evaluation benchmark for Foundation Models for EO. Benchmarking the generalization performance of Foundation Models is important as it has become difficult to standardize a fair comparison across the many different models that have been proposed recently. We present the results using our evaluation benchmark for EO Foundation Models and show that Foundation Models are label efficient in the downstream tasks and help us solve problems we are tackling in EO and remote sensing.

CVJun 26, 2024
Improving EO Foundation Models with Confidence Assessment for enhanced Semantic segmentation

Nikolaos Dionelis, Nicolas Longepe

Confidence assessments of semantic segmentation algorithms are important. Ideally, deep learning models should have the ability to predict in advance whether their output is likely to be incorrect. Assessing the confidence levels of model predictions in Earth Observation (EO) classification is essential, as it can enhance semantic segmentation performance and help prevent further exploitation of the results in case of erroneous prediction. The model we developed, Confidence Assessment for enhanced Semantic segmentation (CAS), evaluates confidence at both the segment and pixel levels, providing both labels and confidence scores as output. Our model, CAS, identifies segments with incorrect predicted labels using the proposed combined confidence metric, refines the model, and enhances its performance. This work has significant applications, particularly in evaluating EO Foundation Models on semantic segmentation downstream tasks, such as land cover classification using Sentinel-2 satellite data. The evaluation results show that this strategy is effective and that the proposed model CAS outperforms other baseline models.

LGNov 30, 2021
FROB: Few-shot ROBust Model for Classification and Out-of-Distribution Detection

Nikolaos Dionelis, Mehrdad Yaghoobi, Sotirios A. Tsaftaris

Nowadays, classification and Out-of-Distribution (OoD) detection in the few-shot setting remain challenging aims due to rarity and the limited samples in the few-shot setting, and because of adversarial attacks. Accomplishing these aims is important for critical systems in safety, security, and defence. In parallel, OoD detection is challenging since deep neural network classifiers set high confidence to OoD samples away from the training data. To address such limitations, we propose the Few-shot ROBust (FROB) model for classification and few-shot OoD detection. We devise FROB for improved robustness and reliable confidence prediction for few-shot OoD detection. We generate the support boundary of the normal class distribution and combine it with few-shot Outlier Exposure (OE). We propose a self-supervised learning few-shot confidence boundary methodology based on generative and discriminative models. The contribution of FROB is the combination of the generated boundary in a self-supervised learning manner and the imposition of low confidence at this learned boundary. FROB implicitly generates strong adversarial samples on the boundary and forces samples from OoD, including our boundary, to be less confident by the classifier. FROB achieves generalization to unseen OoD with applicability to unknown, in the wild, test sets that do not correlate to the training datasets. To improve robustness, FROB redesigns OE to work even for zero-shots. By including our boundary, FROB reduces the threshold linked to the model's few-shot robustness; it maintains the OoD performance approximately independent of the number of few-shots. The few-shot robustness analysis evaluation of FROB on different sets and on One-Class Classification (OCC) data shows that FROB achieves competitive performance and outperforms benchmarks in terms of robustness to the outlier few-shot sample population and variability.

LGOct 28, 2021
OMASGAN: Out-of-Distribution Minimum Anomaly Score GAN for Sample Generation on the Boundary

Nikolaos Dionelis, Mehrdad Yaghoobi, Sotirios A. Tsaftaris

Generative models trained in an unsupervised manner may set high likelihood and low reconstruction loss to Out-of-Distribution (OoD) samples. This increases Type II errors and leads to missed anomalies, overall decreasing Anomaly Detection (AD) performance. In addition, AD models underperform due to the rarity of anomalies. To address these limitations, we propose the OoD Minimum Anomaly Score GAN (OMASGAN). OMASGAN generates, in a negative data augmentation manner, anomalous samples on the estimated distribution boundary. These samples are then used to refine an AD model, leading to more accurate estimation of the underlying data distribution including multimodal supports with disconnected modes. OMASGAN performs retraining by including the abnormal minimum-anomaly-score OoD samples generated on the distribution boundary in a self-supervised learning manner. For inference, for AD, we devise a discriminator which is trained with negative and positive samples either generated (negative or positive) or real (only positive). OMASGAN addresses the rarity of anomalies by generating strong and adversarial OoD samples on the distribution boundary using only normal class data, effectively addressing mode collapse. A key characteristic of our model is that it uses any f-divergence distribution metric in its variational representation, not requiring invertibility. OMASGAN does not use feature engineering and makes no assumptions about the data distribution. The evaluation of OMASGAN on image data using the leave-one-out methodology shows that it achieves an improvement of at least 0.24 and 0.07 points in AUROC on average on the MNIST and CIFAR-10 datasets, respectively, over other benchmark and state-of-the-art models for AD.

LGJul 24, 2021
Tail of Distribution GAN (TailGAN): Generative-Adversarial-Network-Based Boundary Formation

Nikolaos Dionelis, Mehrdad Yaghoobi, Sotirios A. Tsaftaris

Generative Adversarial Networks (GAN) are a powerful methodology and can be used for unsupervised anomaly detection, where current techniques have limitations such as the accurate detection of anomalies near the tail of a distribution. GANs generally do not guarantee the existence of a probability density and are susceptible to mode collapse, while few GANs use likelihood to reduce mode collapse. In this paper, we create a GAN-based tail formation model for anomaly detection, the Tail of distribution GAN (TailGAN), to generate samples on the tail of the data distribution and detect anomalies near the support boundary. Using TailGAN, we leverage GANs for anomaly detection and use maximum entropy regularization. Using GANs that learn the probability of the underlying distribution has advantages in improving the anomaly detection methodology by allowing us to devise a generator for boundary samples, and use this model to characterize anomalies. TailGAN addresses supports with disjoint components and achieves competitive performance on images. We evaluate TailGAN for identifying Out-of-Distribution (OoD) data and its performance evaluated on MNIST, CIFAR-10, Baggage X-Ray, and OoD data shows competitiveness compared to methods from the literature.

LGJul 21, 2021
Boundary of Distribution Support Generator (BDSG): Sample Generation on the Boundary

Nikolaos Dionelis, Mehrdad Yaghoobi, Sotirios A. Tsaftaris

Generative models, such as Generative Adversarial Networks (GANs), have been used for unsupervised anomaly detection. While performance keeps improving, several limitations exist particularly attributed to difficulties at capturing multimodal supports and to the ability to approximate the underlying distribution closer to the tails, i.e. the boundary of the distribution's support. This paper proposes an approach that attempts to alleviate such shortcomings. We propose an invertible-residual-network-based model, the Boundary of Distribution Support Generator (BDSG). GANs generally do not guarantee the existence of a probability distribution and here, we use the recently developed Invertible Residual Network (IResNet) and Residual Flow (ResFlow), for density estimation. These models have not yet been used for anomaly detection. We leverage IResNet and ResFlow for Out-of-Distribution (OoD) sample detection and for sample generation on the boundary using a compound loss function that forces the samples to lie on the boundary. The BDSG addresses non-convex support, disjoint components, and multimodal distributions. Results on synthetic data and data from multimodal distributions, such as MNIST and CIFAR-10, demonstrate competitive performance compared to methods from the literature.

SDOct 31, 2018
On Single-Channel Speech Enhancement and On Non-Linear Modulation-Domain Kalman Filtering

Nikolaos Dionelis

This report focuses on algorithms that perform single-channel speech enhancement. The author of this report uses modulation-domain Kalman filtering algorithms for speech enhancement, i.e. noise suppression and dereverberation, in [1], [2], [3], [4] and [5]. Modulation-domain Kalman filtering can be applied for both noise and late reverberation suppression and in [2], [1], [3] and [4], various model-based speech enhancement algorithms that perform modulation-domain Kalman filtering are designed, implemented and tested. The model-based enhancement algorithm in [2] estimates and tracks the speech phase. The short-time-Fourier-transform-based enhancement algorithm in [5] uses the active speech level estimator presented in [6]. This report describes how different algorithms perform speech enhancement and the algorithms discussed in this report are addressed to researchers interested in monaural speech enhancement. The algorithms are composed of different processing blocks and techniques [7]; understanding the implementation choices made during the system design is important because this provides insights that can assist the development of new algorithms. Index Terms - Speech enhancement, dereverberation, denoising, Kalman filter, minimum mean squared error estimation.

SDJul 26, 2018
Modulation-Domain Kalman Filtering for Monaural Blind Speech Denoising and Dereverberation

Nikolaos Dionelis, Mike Brookes

We describe a monaural speech enhancement algorithm based on modulation-domain Kalman filtering to blindly track the time-frequency log-magnitude spectra of speech and reverberation. We propose an adaptive algorithm that performs blind joint denoising and dereverberation, while accounting for the inter-frame speech dynamics, by estimating the posterior distribution of the speech log-magnitude spectrum given the log-magnitude spectrum of the noisy reverberant speech. The Kalman filter update step models the non-linear relations between the speech, noise and reverberation log-spectra. The Kalman filtering algorithm uses a signal model that takes into account the reverberation parameters of the reverberation time, $T_{60}$, and the direct-to-reverberant energy ratio (DRR) and also estimates and tracks the $T_{60}$ and the DRR in every frequency bin in order to improve the estimation of the speech log-magnitude spectrum. The Kalman filtering algorithm is tested and graphs that depict the estimated reverberation features over time are examined. The proposed algorithm is evaluated in terms of speech quality, speech intelligibility and dereverberation performance for a range of reverberation parameters and SNRs, in different noise types, and is also compared to competing denoising and dereverberation techniques. Experimental results using noisy reverberant speech demonstrate the effectiveness of the enhancement algorithm.

SDAug 7, 2017
Phase-Aware Single-Channel Speech Enhancement with Modulation-Domain Kalman Filtering

Nikolaos Dionelis, Mike Brookes

We present a single-channel phase-sensitive speech enhancement algorithm that is based on modulation-domain Kalman filtering and on tracking the speech phase using circular statistics. With Kalman filtering, using that speech and noise are additive in the complex STFT domain, the algorithm tracks the speech log-spectrum, the noise log-spectrum and the speech phase. Joint amplitude and phase estimation of speech is performed. Given the noisy speech signal, conventional algorithms use the noisy phase for signal reconstruction approximating the speech phase with the noisy phase. In the proposed Kalman filtering algorithm, the speech phase posterior is used to create an enhanced speech phase spectrum for signal reconstruction. The Kalman filter prediction models the temporal/inter-frame correlation of the speech and noise log-spectra and of the speech phase, while the Kalman filter update models their nonlinear relations. With the proposed algorithm, speech is tracked and estimated both in the log-spectral and spectral phase domains. The algorithm is evaluated in terms of speech quality and different algorithm configurations, dependent on the signal model, are compared in different noise types. Experimental results show that the proposed algorithm outperforms traditional enhancement algorithms over a range of SNRs for various noise types.