Mikolaj Czerkawski

CV
h-index24
19papers
139citations
Novelty31%
AI Score41

19 Papers

CVNov 6, 2023
Exploring the Capability of Text-to-Image Diffusion Models with Structural Edge Guidance for Multi-Spectral Satellite Image Inpainting

Mikolaj Czerkawski, Christos Tachtatzis

The letter investigates the utility of text-to-image inpainting models for satellite image data. Two technical challenges of injecting structural guiding signals into the generative process as well as translating the inpainted RGB pixels to a wider set of MSI bands are addressed by introducing a novel inpainting framework based on StableDiffusion and ControlNet as well as a novel method for RGB-to-MSI translation. The results on a wider set of data suggest that the inpainting synthesized via StableDiffusion suffers from undesired artifacts and that a simple alternative of self-supervised internal inpainting achieves a higher quality of synthesis.

AO-PHOct 5, 2023
IceCloudNet: Cirrus and mixed-phase cloud prediction from SEVIRI input learned from sparse supervision

Kai Jeggle, Mikolaj Czerkawski, Federico Serva et al.

Clouds containing ice particles play a crucial role in the climate system. Yet they remain a source of great uncertainty in climate models and future climate projections. In this work, we create a new observational constraint of regime-dependent ice microphysical properties at the spatio-temporal coverage of geostationary satellite instruments and the quality of active satellite retrievals. We achieve this by training a convolutional neural network on three years of SEVIRI and DARDAR data sets. This work will enable novel research to improve ice cloud process understanding and hence, reduce uncertainties in a changing climate and help assess geoengineering methods for cirrus clouds.

CVNov 10, 2023
Diffusion Models for Earth Observation Use-cases: from cloud removal to urban change detection

Fulvio Sanguigni, Mikolaj Czerkawski, Lorenzo Papa et al.

The advancements in the state of the art of generative Artificial Intelligence (AI) brought by diffusion models can be highly beneficial in novel contexts involving Earth observation data. After introducing this new family of generative models, this work proposes and analyses three use cases which demonstrate the potential of diffusion-based approaches for satellite image data. Namely, we tackle cloud removal and inpainting, dataset generation for change-detection tasks, and urban replanning.

CVAug 1, 2023
Detecting Cloud Presence in Satellite Images Using the RGB-based CLIP Vision-Language Model

Mikolaj Czerkawski, Robert Atkinson, Christos Tachtatzis

This work explores capabilities of the pre-trained CLIP vision-language model to identify satellite images affected by clouds. Several approaches to using the model to perform cloud presence detection are proposed and evaluated, including a purely zero-shot operation with text prompts and several fine-tuning approaches. Furthermore, the transferability of the methods across different datasets and sensor types (Sentinel-2 and Landsat-8) is tested. The results that CLIP can achieve non-trivial performance on the cloud presence detection task with apparent capability to generalise across sensing modalities and sensing bands. It is also found that a low-cost fine-tuning stage leads to a strong increase in true negative rate. The results demonstrate that the representations learned by the CLIP model can be useful for satellite image processing tasks involving clouds.

CVSep 27, 2023
From LAION-5B to LAION-EO: Filtering Billions of Images Using Anchor Datasets for Satellite Image Extraction

Mikolaj Czerkawski, Alistair Francis

Large datasets, such as LAION-5B, contain a diverse distribution of images shared online. However, extraction of domain-specific subsets of large image corpora is challenging. The extraction approach based on an anchor dataset, combined with further filtering, is proposed here and demonstrated for the domain of satellite imagery. This results in the release of LAION-EO, a dataset sourced from the web containing pairs of text and satellite images in high (pixel-wise) resolution. The paper outlines the acquisition procedure as well as some of the features of the dataset.

CVFeb 19, 2024Code
Major TOM: Expandable Datasets for Earth Observation

Alistair Francis, Mikolaj Czerkawski

Deep learning models are increasingly data-hungry, requiring significant resources to collect and compile the datasets needed to train them, with Earth Observation (EO) models being no exception. However, the landscape of datasets in EO is relatively atomised, with interoperability made difficult by diverse formats and data structures. If ever larger datasets are to be built, and duplication of effort minimised, then a shared framework that allows users to combine and access multiple datasets is needed. Here, Major TOM (Terrestrial Observation Metaset) is proposed as this extensible framework. Primarily, it consists of a geographical indexing system based on a set of grid points and a metadata structure that allows multiple datasets with different sources to be merged. Besides the specification of Major TOM as a framework, this work also presents a large, open-access dataset, MajorTOM-Core, which covers the vast majority of the Earth's land surface. This dataset provides the community with both an immediately useful resource, as well as acting as a template for future additions to the Major TOM ecosystem. Access: https://huggingface.co/Major-TOM

CVApr 15, 2025Code
TerraMesh: A Planetary Mosaic of Multimodal Earth Observation Data

Benedikt Blumenstiel, Paolo Fraccaro, Valerio Marsocci et al.

Large-scale foundation models in Earth Observation can learn versatile, label-efficient representations by leveraging massive amounts of unlabeled data. However, existing public datasets are often limited in scale, geographic coverage, or sensor variety. We introduce TerraMesh, a new globally diverse, multimodal dataset combining optical, synthetic aperture radar, elevation, and land-cover modalities in an Analysis-Ready Data format. TerraMesh includes over 9~million samples with eight spatiotemporal aligned modalities, enabling large-scale pre-training. We provide detailed data processing steps, comprehensive statistics, and empirical evidence demonstrating improved model performance when pre-trained on TerraMesh. The dataset is hosted at https://huggingface.co/datasets/ibm-esa-geospatial/TerraMesh.

CVMar 31
EarthEmbeddingExplorer: A Web Application for Cross-Modal Retrieval of Global Satellite Images

Yijie Zheng, Weijie Wu, Bingyue Wu et al.

While the Earth observation community has witnessed a surge in high-impact foundation models and global Earth embedding datasets, a significant barrier remains in translating these academic assets into freely accessible tools. This tutorial introduces EarthEmbeddingExplorer, an interactive web application designed to bridge this gap, transforming static research artifacts into dynamic, practical workflows for discovery. We will provide a comprehensive hands-on guide to the system, detailing its cloud-native software architecture, demonstrating cross-modal queries (natural language, visual, and geolocation), and showcasing how to derive scientific insights from retrieval results. By democratizing access to precomputed Earth embeddings, this tutorial empowers researchers to seamlessly transition from state-of-the-art models and data archives to real-world application and analysis. The web application is available at https://modelscope.ai/studios/Major-TOM/EarthEmbeddingExplorer.

CVMar 3
COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data -- Generation Stochastic by Design

Miguel Espinosa, Eva Gmelich Meijling, Valerio Marsocci et al.

Earth observation applications increasingly rely on data from multiple sensors, including optical, radar, elevation, and land-cover products. Relationships between these modalities are fundamental for data integration but are inherently non-injective: identical conditioning information can correspond to multiple physically plausible observations. Thus, such conditional mappings should be parametrised as data distributions. As a result, deterministic models tend to collapse toward conditional means and fail to represent the uncertainty and variability required for tasks such as data completion and cross-sensor translation. We introduce COP-GEN, a multimodal latent diffusion transformer that models the joint distribution of heterogeneous Earth Observation modalities at their native spatial resolutions. By parameterising cross-modal mappings as conditional distributions, COP-GEN enables flexible any-to-any conditional generation, including zero-shot modality translation, spectral band infilling, and generation under partial or missing inputs, without task-specific retraining. Experiments on a large-scale global multimodal dataset show that COP-GEN generates diverse yet physically consistent realisations while maintaining strong peak fidelity across optical, radar, and elevation modalities. Qualitative and quantitative analyses demonstrate that the model captures meaningful cross-modal structure and systematically adapts its output uncertainty as conditioning information increases. These results highlight the practical importance of stochastic generative modeling for Earth observation and motivate evaluation protocols that move beyond single-reference, pointwise metrics. Website: https:// miquel-espinosa.github.io/cop-gen

GRJan 30
HeatMat: Simulation of City Material Impact on Urban Heat Island Effect

Marie Reinbigler, Romain Rouffet, Peter Naylor et al.

The Urban Heat Island (UHI) effect, defined as a significant increase in temperature in urban environments compared to surrounding areas, is difficult to study in real cities using sensor data (satellites or in-situ stations) due to their coarse spatial and temporal resolution. Among the factors contributing to this effect are the properties of urban materials, which differ from those in rural areas. To analyze their individual impact and to test new material configurations, a high-resolution simulation at the city scale is required. Estimating the current materials used in a city, including those on building facades, is also challenging. We propose HeatMat, an approach to analyze at high resolution the individual impact of urban materials on the UHI effect in a real city, relying only on open data. We estimate building materials using street-view images and a pre-trained vision-language model (VLM) to supplement existing OpenStreetMap data, which describes the 2D geometry and features of buildings. We further encode this information into a set of 2D maps that represent the city's vertical structure and material characteristics. These maps serve as inputs for our 2.5D simulator, which models coupled heat transfers and enables random-access surface temperature estimation at multiple resolutions, reaching an x20 speedup compared to an equivalent simulation in 3D.

CVApr 12, 2024
Interference Motion Removal for Doppler Radar Vital Sign Detection Using Variational Encoder-Decoder Neural Network

Mikolaj Czerkawski, Christos Ilioudis, Carmine Clemente et al.

The treatment of interfering motion contributions remains one of the key challenges in the domain of radar-based vital sign monitoring. Removal of the interference to extract the vital sign contributions is demanding due to overlapping Doppler bands, the complex structure of the interference motions and significant variations in the power levels of their contributions. A novel approach to the removal of interference through the use of a probabilistic deep learning model is presented. Results show that a convolutional encoder-decoder neural network with a variational objective is capable of learning a meaningful representation space of vital sign Doppler-time distribution facilitating their extraction from a mixture signal. The approach is tested on semi-experimental data containing real vital sign signatures and simulated returns from interfering body motions. The application of the proposed network enhances the extraction of the micro-Doppler frequency corresponding to the respiration rate is demonstrated.

CVDec 7, 2024
Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space

Mikolaj Czerkawski, Marcin Kluczek, Jędrzej S. Bojanowski

With the ever-increasing volumes of the Earth observation data present in the archives of large programmes such as Copernicus, there is a growing need for efficient vector representations of the underlying raw data. The approach of extracting feature representations from pretrained deep neural networks is a powerful approach that can provide semantic abstractions of the input data. However, the way this is done for imagery archives containing geospatial data has not yet been defined. In this work, an extension is proposed to an existing community project, Major TOM, focused on the provision and standardization of open and free AI-ready datasets for Earth observation. Furthermore, four global and dense embedding datasets are released openly and for free along with the publication of this manuscript, resulting in the most comprehensive global open dataset of geospatial visual embeddings in terms of covered Earth's surface.

SPApr 12, 2024
A Novel Micro-Doppler Coherence Loss for Deep Learning Radar Applications

Mikolaj Czerkawski, Christos Ilioudis, Carmine Clemente et al.

Deep learning techniques are subject to increasing adoption for a wide range of micro-Doppler applications, where predictions need to be made based on time-frequency signal representations. Most, if not all, of the reported applications focus on translating an existing deep learning framework to this new domain with no adjustment made to the objective function. This practice results in a missed opportunity to encourage the model to prioritize features that are particularly relevant for micro-Doppler applications. Thus the paper introduces a micro-Doppler coherence loss, minimized when the normalized power of micro-Doppler oscillatory components between input and output is matched. The experiments conducted on real data show that the application of the introduced loss results in models more resilient to noise.

CVFeb 21, 2024
Robustness of Deep Neural Networks for Micro-Doppler Radar Classification

Mikolaj Czerkawski, Carmine Clemente, Craig Michie et al.

With the great capabilities of deep classifiers for radar data processing come the risks of learning dataset-specific features that do not generalize well. In this work, the robustness of two deep convolutional architectures, trained and tested on the same data, is evaluated. When standard training practice is followed, both classifiers exhibit sensitivity to subtle temporal shifts of the input representation, an augmentation that carries minimal semantic content. Furthermore, the models are extremely susceptible to adversarial examples. Both small temporal shifts and adversarial examples are a result of a model overfitting on features that do not generalize well. As a remedy, it is shown that training on adversarial examples and temporally augmented samples can reduce this effect and lead to models that generalise better. Finally, models operating on cadence-velocity diagram representation rather than Doppler-time are demonstrated to be naturally more immune to adversarial examples.

GRApr 9, 2025
MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data

Paul Borne--Pons, Mikolaj Czerkawski, Rosalie Martin et al.

Terrain modeling has traditionally relied on procedural techniques, which often require extensive domain expertise and handcrafted rules. In this paper, we present MESA - a novel data-centric alternative by training a diffusion model on global remote sensing data. This approach leverages large-scale geospatial information to generate high-quality terrain samples from text descriptions, showcasing a flexible and scalable solution for terrain generation. The model's capabilities are demonstrated through extensive experiments, highlighting its ability to generate realistic and diverse terrain landscapes. The dataset produced to support this work, the Major TOM Core-DEM extension dataset, is released openly as a comprehensive resource for global terrain data. The results suggest that data-driven models, trained on remote sensing data, can provide a powerful tool for realistic terrain modeling and generation.

GRApr 11, 2025
COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails

Miguel Espinosa, Valerio Marsocci, Yuru Jia et al.

In remote sensing, multi-modal data from various sensors capturing the same scene offers rich opportunities, but learning a unified representation across these modalities remains a significant challenge. Traditional methods have often been limited to single or dual-modality approaches. In this paper, we introduce COP-GEN-Beta, a generative diffusion model trained on optical, radar, and elevation data from the Major TOM dataset. What sets COP-GEN-Beta apart is its ability to map any subset of modalities to any other, enabling zero-shot modality translation after training. This is achieved through a sequence-based diffusion transformer, where each modality is controlled by its own timestep embedding. We extensively evaluate COP-GEN-Beta on thumbnail images from the Major TOM dataset, demonstrating its effectiveness in generating high-quality samples. Qualitative and quantitative evaluations validate the model's performance, highlighting its potential as a powerful pre-trained model for future remote sensing tasks.

CVApr 12, 2024
On Input Formats for Radar Micro-Doppler Signature Processing by Convolutional Neural Networks

Mikolaj Czerkawski, Carmine Clemente, Craig Michie et al.

Convolutional neural networks have often been proposed for processing radar Micro-Doppler signatures, most commonly with the goal of classifying the signals. The majority of works tend to disregard phase information from the complex time-frequency representation. Here, the utility of the phase information, as well as the optimal format of the Doppler-time input for a convolutional neural network, is analysed. It is found that the performance achieved by convolutional neural network classifiers is heavily influenced by the type of input representation, even across formats with equivalent information. Furthermore, it is demonstrated that the phase component of the Doppler-time representation contains rich information useful for classification and that unwrapping the phase in the temporal dimension can improve the results compared to a magnitude-only solution, improving accuracy from 0.920 to 0.938 on the tested human activity dataset. Further improvement of 0.947 is achieved by training a linear classifier on embeddings from multiple-formats.

CVDec 2, 2021
Neural Weight Step Video Compression

Mikolaj Czerkawski, Javier Cardona, Robert Atkinson et al.

A variety of compression methods based on encoding images as weights of a neural network have been recently proposed. Yet, the potential of similar approaches for video compression remains unexplored. In this work, we suggest a set of experiments for testing the feasibility of compressing video using two architectural paradigms, coordinate-based MLP (CbMLP) and convolutional network. Furthermore, we propose a novel technique of neural weight stepping, where subsequent frames of a video are encoded as low-entropy parameter updates. To assess the feasibility of the considered approaches, we will test the video compression performance on several high-resolution video datasets and compare against existing conventional and neural compression techniques.

CVSep 29, 2021
Neural Knitworks: Patched Neural Implicit Representation Networks

Mikolaj Czerkawski, Javier Cardona, Robert Atkinson et al.

Coordinate-based Multilayer Perceptron (MLP) networks, despite being capable of learning neural implicit representations, are not performant for internal image synthesis applications. Convolutional Neural Networks (CNNs) are typically used instead for a variety of internal generative tasks, at the cost of a larger model. We propose Neural Knitwork, an architecture for neural implicit representation learning of natural images that achieves image synthesis by optimizing the distribution of image patches in an adversarial manner and by enforcing consistency between the patch predictions. To the best of our knowledge, this is the first implementation of a coordinate-based MLP tailored for synthesis tasks such as image inpainting, super-resolution, and denoising. We demonstrate the utility of the proposed technique by training on these three tasks. The results show that modeling natural images using patches, rather than pixels, produces results of higher fidelity. The resulting model requires 80% fewer parameters than alternative CNN-based solutions while achieving comparable performance and training time.