QMJan 4, 2023Code
MSCDA: Multi-level Semantic-guided Contrast Improves Unsupervised Domain Adaptation for Breast MRI Segmentation in Small DatasetsSheng Kuang, Henry C. Woodruff, Renee Granzier et al.
Deep learning (DL) applied to breast tissue segmentation in magnetic resonance imaging (MRI) has received increased attention in the last decade, however, the domain shift which arises from different vendors, acquisition protocols, and biological heterogeneity, remains an important but challenging obstacle on the path towards clinical implementation. In this paper, we propose a novel Multi-level Semantic-guided Contrastive Domain Adaptation (MSCDA) framework to address this issue in an unsupervised manner. Our approach incorporates self-training with contrastive learning to align feature representations between domains. In particular, we extend the contrastive loss by incorporating pixel-to-pixel, pixel-to-centroid, and centroid-to-centroid contrasts to better exploit the underlying semantic information of the image at different levels. To resolve the data imbalance problem, we utilize a category-wise cross-domain sampling strategy to sample anchors from target images and build a hybrid memory bank to store samples from source images. We have validated MSCDA with a challenging task of cross-domain breast MRI segmentation between datasets of healthy volunteers and invasive breast cancer patients. Extensive experiments show that MSCDA effectively improves the model's feature alignment capabilities between domains, outperforming state-of-the-art methods. Furthermore, the framework is shown to be label-efficient, achieving good performance with a smaller source dataset. The code is publicly available at \url{https://github.com/ShengKuangCN/MSCDA}.
7.3LGMay 28Code
Beyond MSE: Improving Precipitation Nowcasting with Multi-Quantile RegressionGijs van Nieuwkoop, Siamak Mehrkanoon
Deep-learning precipitation nowcasting models are often optimized using pointwise losses such as mean squared error or mean absolute error, which can lead to overly smooth forecasts and poor representation of heavy rainfall. This study investigates whether the predictive performance of an established deterministic nowcasting architecture can be improved by reformulating training as a multi-quantile regression problem. Using SmaAt-UNet as a core model, we compare MSE, MAE, and multi-quantile pinball-loss training on radar precipitation nowcasting over the Netherlands. The results show that multi-quantile training improves the central deterministic forecast, decreasing test-set MSE by 8.6\% compared to a model trained using MSE, while also producing upper-quantile outputs that are useful for risk-sensitive prediction of heavy precipitation. These findings suggest that quantile regression provides a simple alternative to standard pointwise losses without requiring a new architecture or generative sampling procedure. The implementation of our models and training setup is available on \href{https://github.com/gijsvn/Multi-Quantile-Precipitation-Nowcasting}{GitHub}.
LGNov 30, 2023Code
TransCORALNet: A Two-Stream Transformer CORAL Networks for Supply Chain Credit Assessment Cold StartJie Shi, Arno P. J. M. Siebes, Siamak Mehrkanoon
This paper proposes an interpretable two-stream transformer CORAL networks (TransCORALNet) for supply chain credit assessment under the segment industry and cold start problem. The model aims to provide accurate credit assessment prediction for new supply chain borrowers with limited historical data. Here, the two-stream domain adaptation architecture with correlation alignment (CORAL) loss is used as a core model and is equipped with transformer, which provides insights about the learned features and allow efficient parallelization during training. Thanks to the domain adaptation capability of the proposed model, the domain shift between the source and target domain is minimized. Therefore, the model exhibits good generalization where the source and target do not follow the same distribution, and a limited amount of target labeled instances exist. Furthermore, we employ Local Interpretable Model-agnostic Explanations (LIME) to provide more insight into the model prediction and identify the key features contributing to supply chain credit assessment decisions. The proposed model addresses four significant supply chain credit assessment challenges: domain shift, cold start, imbalanced-class and interpretability. Experimental results on a real-world data set demonstrate the superiority of TransCORALNet over a number of state-of-the-art baselines in terms of accuracy. The code is available on GitHub https://github.com/JieJieNiu/TransCORALN .
11.7LGMar 23Code
SmaAT-QMix-UNet: A Parameter-Efficient Vector-Quantized UNet for Precipitation NowcastingNikolas Stavrou, Siamak Mehrkanoon
Weather forecasting supports critical socioeconomic activities and complements environmental protection, yet operational Numerical Weather Prediction (NWP) systems remain computationally intensive, thus being inefficient for certain applications. Meanwhile, recent advances in deep data-driven models have demonstrated promising results in nowcasting tasks. This paper presents SmaAT-QMix-UNet, an enhanced variant of SmaAT-UNet that introduces two key innovations: a vector quantization (VQ) bottleneck at the encoder-decoder bridge, and mixed kernel depth-wise convolutions (MixConv) replacing selected encoder and decoder blocks. These enhancements both reduce the model's size and improve its nowcasting performance. We train and evaluate SmaAT-QMix-UNet on a Dutch radar precipitation dataset (2016-2019), predicting precipitation 30 minutes ahead. Three configurations are benchmarked: using only VQ, only MixConv, and the full SmaAT-QMix-UNet. Grad-CAM saliency maps highlight the regions influencing each nowcast, while a UMAP embedding of the codewords illustrates how the VQ layer clusters encoder outputs. The source code for SmaAT-QMix-UNet is publicly available on GitHub \footnote{\href{https://github.com/nstavr04/MasterThesisSnellius}{https://github.com/nstavr04/MasterThesisSnellius}}.
LGMar 12, 2023
SAR-UNet: Small Attention Residual UNet for Explainable Nowcasting TasksMathieu Renault, Siamak Mehrkanoon
The accuracy and explainability of data-driven nowcasting models are of great importance in many socio-economic sectors reliant on weather-dependent decision making. This paper proposes a novel architecture called Small Attention Residual UNet (SAR-UNet) for precipitation and cloud cover nowcasting. Here, SmaAt-UNet is used as a core model and is further equipped with residual connections, parallel to the depthwise separable convolutions. The proposed SAR-UNet model is evaluated on two datasets, i.e., Dutch precipitation maps ranging from 2016 to 2019 and French cloud cover binary images from 2017 to 2018. The obtained results show that SAR-UNet outperforms other examined models in precipitation nowcasting from 30 to 180 minutes in the future as well as cloud cover nowcasting in the next 90 minutes. Furthermore, we provide additional insights on the nowcasts made by our proposed model using Grad-CAM, a visual explanation technique, which is employed on different levels of the encoder and decoder paths of the SAR-UNet model and produces heatmaps highlighting the critical regions in the input image as well as intermediate representations to the precipitation. The heatmaps generated by Grad-CAM reveal the interactions between the residual connections and the depthwise separable convolutions inside of the multiple depthwise separable blocks placed throughout the network architecture.
LGFeb 8, 2023
WF-UNet: Weather Fusion UNet for Precipitation NowcastingChristos Kaparakis, Siamak Mehrkanoon
Designing early warning systems for harsh weather and its effects, such as urban flooding or landslides, requires accurate short-term forecasts (nowcasts) of precipitation. Nowcasting is a significant task with several environmental applications, such as agricultural management or increasing flight safety. In this study, we investigate the use of a UNet core-model and its extension for precipitation nowcasting in western Europe for up to 3 hours ahead. In particular, we propose the Weather Fusion UNet (WF-UNet) model, which utilizes the Core 3D-UNet model and integrates precipitation and wind speed variables as input in the learning process and analyze its influences on the precipitation target task. We have collected six years of precipitation and wind radar images from Jan 2016 to Dec 2021 of 14 European countries, with 1-hour temporal resolution and 31 square km spatial resolution based on the ERA5 dataset, provided by Copernicus, the European Union's Earth observation programme. We compare the proposed WF-UNet model to persistence model as well as other UNet based architectures that are trained only using precipitation radar input data. The obtained results show that WF-UNet outperforms the other examined best-performing architectures by 22%, 8% and 6% lower MSE at a horizon of 1, 2 and 3 hours respectively.
SDJul 8, 2022
BAST: Binaural Audio Spectrogram Transformer for Binaural Sound LocalizationSheng Kuang, Jie Shi, Kiki van der Heijden et al.
Accurate sound localization in a reverberation environment is essential for human auditory perception. Recently, Convolutional Neural Networks (CNNs) have been utilized to model the binaural human auditory pathway. However, CNN shows barriers in capturing the global acoustic features. To address this issue, we propose a novel end-to-end Binaural Audio Spectrogram Transformer (BAST) model to predict the sound azimuth in both anechoic and reverberation environments. Two modes of implementation, i.e. BAST-SP and BAST-NSP corresponding to BAST model with shared and non-shared parameters respectively, are explored. Our model with subtraction interaural integration and hybrid loss achieves an angular distance of 1.29 degrees and a Mean Square Error of 1e-3 at all azimuths, significantly surpassing CNN based model. The exploratory analysis of the BAST's performance on the left-right hemifields and anechoic and reverberation environments shows its generalization ability as well as the feasibility of binaural Transformers in sound localization. Furthermore, the analysis of the attention maps is provided to give additional insights on the interpretation of the localization process in a natural reverberant environment.
LGApr 28, 2022
GCN-FFNN: A Two-Stream Deep Model for Learning Solution to Partial Differential EquationsOnur Bilgin, Thomas Vergutz, Siamak Mehrkanoon
This paper introduces a novel two-stream deep model based on graph convolutional network (GCN) architecture and feed-forward neural networks (FFNN) for learning the solution of nonlinear partial differential equations (PDEs). The model aims at incorporating both graph and grid input representations using two streams corresponding to GCN and FFNN models, respectively. Each stream layer receives and processes its own input representation. As opposed to FFNN which receives a grid-like structure, the GCN stream layer operates on graph input data where the neighborhood information is incorporated through the adjacency matrix of the graph. In this way, the proposed GCN-FFNN model learns from two types of input representations, i.e. grid and graph data, obtained via the discretization of the PDE domain. The GCN-FFNN model is trained in two phases. In the first phase, the model parameters of each stream are trained separately. Both streams employ the same error function to adjust their parameters by enforcing the models to satisfy the given PDE as well as its initial and boundary conditions on grid or graph collocation (training) data. In the second phase, the learned parameters of two-stream layers are frozen and their learned representation solutions are fed to fully connected layers whose parameters are learned using the previously used error function. The learned GCN-FFNN model is tested on test data located both inside and outside the PDE domain. The obtained numerical results demonstrate the applicability and efficiency of the proposed GCN-FFNN model over individual GCN and FFNN models on 1D-Burgers, 1D-Schrödinger, 2D-Burgers and 2D-Schrödinger equations.
9.7LGApr 11
A Diffusion-Contrastive Graph Neural Network with Virtual Nodes for Wind Nowcasting in Unobserved RegionsJie Shi, Siamak Mehrkanoon
Accurate weather nowcasting remains one of the central challenges in atmospheric science, with critical implications for climate resilience, energy security, and disaster preparedness. Since it is not feasible to deploy observation stations everywhere, some regions lack dense observational networks, resulting in unreliable short-term wind predictions across those unobserved areas. Here we present a deep graph self-supervised framework that extends nowcasting capability into such unobserved regions without requiring new sensors. Our approach introduces "virtual nodes" into a diffusion and contrastive-based graph neural network, enabling the model to learn wind condition (i.e., speed, direction and gusts) in places with no direct measurements. Using high-temporal resolution weather station data across the Netherlands, we demonstrate that this approach reduces nowcast mean absolute error (MAE) of wind speed, gusts, and direction in unobserved regions by more than 30% - 46% compared with interpolation and regression methods. By enabling localized nowcasts where no measurements exist, this method opens new pathways for renewable energy integration, agricultural planning, and early-warning systems in data-sparse regions.
LGDec 16, 2025
AnySleep: a channel-agnostic deep learning system for high-resolution sleep staging in multi-center cohortsNiklas Grieger, Jannik Raskob, Siamak Mehrkanoon et al.
Sleep is essential for good health throughout our lives, yet studying its dynamics requires manual sleep staging, a labor-intensive step in sleep research and clinical care. Across centers, polysomnography (PSG) recordings are traditionally scored in 30-s epochs for pragmatic, not physiological, reasons and can vary considerably in electrode count, montage, and subject characteristics. These constraints present challenges in conducting harmonized multi-center sleep studies and discovering novel, robust biomarkers on shorter timescales. Here, we present AnySleep, a deep neural network model that uses any electroencephalography (EEG) or electrooculography (EOG) data to score sleep at adjustable temporal resolutions. We trained and validated the model on over 19,000 overnight recordings from 21 datasets collected across multiple clinics, spanning nearly 200,000 hours of EEG and EOG data, to promote robust generalization across sites. The model attains state-of-the-art performance and surpasses or equals established baselines at 30-s epochs. Performance improves as more channels are provided, yet remains strong when EOG is absent or when only EOG or single EEG derivations (frontal, central, or occipital) are available. On sub-30-s timescales, the model captures short wake intrusions consistent with arousals and improves prediction of physiological characteristics (age, sex) and pathophysiological conditions (sleep apnea), relative to standard 30-s scoring. We make the model publicly available to facilitate large-scale studies with heterogeneous electrode setups and to accelerate the discovery of novel biomarkers in sleep.
SPSep 23, 2024
Dual Stream Graph Transformer Fusion Networks for Enhanced Brain DecodingLucas Goene, Siamak Mehrkanoon
This paper presents the novel Dual Stream Graph-Transformer Fusion (DS-GTF) architecture designed specifically for classifying task-based Magnetoencephalography (MEG) data. In the spatial stream, inputs are initially represented as graphs, which are then passed through graph attention networks (GAT) to extract spatial patterns. Two methods, TopK and Thresholded Adjacency are introduced for initializing the adjacency matrix used in the GAT. In the temporal stream, the Transformer Encoder receives concatenated windowed input MEG data and learns new temporal representations. The learned temporal and spatial representations from both streams are fused before reaching the output layer. Experimental results demonstrate an enhancement in classification performance and a reduction in standard deviation across multiple test subjects compared to other examined models.
LGApr 25, 2025Code
SSA-UNet: Advanced Precipitation Nowcasting via Channel ShufflingMarco Turzi, Siamak Mehrkanoon
Weather forecasting is essential for facilitating diverse socio-economic activity and environmental conservation initiatives. Deep learning techniques are increasingly being explored as complementary approaches to Numerical Weather Prediction (NWP) models, offering potential benefits such as reduced complexity and enhanced adaptability in specific applications. This work presents a novel design, Small Shuffled Attention UNet (SSA-UNet), which enhances SmaAt-UNet's architecture by including a shuffle channeling mechanism to optimize performance and diminish complexity. To assess its efficacy, this architecture and its reduced variant are examined and trained on two datasets: a Dutch precipitation dataset from 2016 to 2019, and a French cloud cover dataset containing radar images from 2017 to 2018. Three output configurations of the proposed architecture are evaluated, yielding outputs of 1, 6, and 12 precipitation maps, respectively. To better understand how this model operates and produces its predictions, a gradient-based approach called Grad-CAM is used to analyze the outputs generated. The analysis of heatmaps generated by Grad-CAM facilitated the identification of regions within the input maps that the model considers most informative for generating its predictions. The implementation of SSA-UNet can be found on our Github\footnote{\href{https://github.com/MarcoTurzi/SSA-UNet}{https://github.com/MarcoTurzi/SSA-UNet}}
SPMay 8, 2025Code
From Sleep Staging to Spindle Detection: Evaluating End-to-End Automated Sleep AnalysisNiklas Grieger, Siamak Mehrkanoon, Philipp Ritter et al.
Automation of sleep analysis, including both macrostructural (sleep stages) and microstructural (e.g., sleep spindles) elements, promises to enable large-scale sleep studies and to reduce variance due to inter-rater incongruencies. While individual steps, such as sleep staging and spindle detection, have been studied separately, the feasibility of automating multi-step sleep analysis remains unclear. Here, we evaluate whether a fully automated analysis using state-of-the-art machine learning models for sleep staging (RobustSleepNet) and subsequent spindle detection (SUMOv2) can replicate findings from an expert-based study of bipolar disorder. The automated analysis qualitatively reproduced key findings from the expert-based study, including significant differences in fast spindle densities between bipolar patients and healthy controls, accomplishing in minutes what previously took months to complete manually. While the results of the automated analysis differed quantitatively from the expert-based study, possibly due to biases between expert raters or between raters and the models, the models individually performed at or above inter-rater agreement for both sleep staging and spindle detection. Our results demonstrate that fully automated approaches have the potential to facilitate large-scale sleep research. We are providing public access to the tools used in our automated analysis by sharing our code and introducing SomnoBot, a privacy-preserving sleep analysis platform.
LGJan 15, 2024
Graph Dual-stream Convolutional Attention Fusion for Precipitation NowcastingLorand Vatamany, Siamak Mehrkanoon
Accurate precipitation nowcasting is crucial for applications such as flood prediction, disaster management, agriculture optimization, and transportation management. While many studies have approached this task using sequence-to-sequence models, most focus on single regions, ignoring correlations between disjoint areas. We reformulate precipitation nowcasting as a spatiotemporal graph sequence problem. Specifically, we propose Graph Dual-stream Convolutional Attention Fusion, a novel extension of the graph attention network. Our model's dual-stream design employs distinct attention mechanisms for spatial and temporal interactions, capturing their unique dynamics. A gated fusion module integrates both streams, leveraging spatial and temporal information for improved predictive accuracy. Additionally, our framework enhances graph attention by directly processing three-dimensional tensors within graph nodes, removing the need for reshaping. This capability enables handling complex, high-dimensional data and exploiting higher-order correlations between data dimensions. Depthwise-separable convolutions are also incorporated to refine local feature extraction and efficiently manage high-dimensional inputs. We evaluate our model using seven years of precipitation data from Copernicus Climate Change Services, covering Europe and neighboring regions. Experimental results demonstrate superior performance of our approach compared to other models. Moreover, visualizations of seasonal spatial and temporal attention scores provide insights into the most significant connections between regions and time steps.
SPDec 15, 2023
A novel dual-stream time-frequency contrastive pretext tasks framework for sleep stage classificationSergio Kazatzidis, Siamak Mehrkanoon
Self-supervised learning addresses the challenge encountered by many supervised methods, i.e. the requirement of large amounts of annotated data. This challenge is particularly pronounced in fields such as the electroencephalography (EEG) research domain. Self-supervised learning operates instead by utilizing pseudo-labels, which are generated by pretext tasks, to obtain a rich and meaningful data representation. In this study, we aim at introducing a dual-stream pretext task architecture that operates both in the time and frequency domains. In particular, we have examined the incorporation of the novel Frequency Similarity (FS) pretext task into two existing pretext tasks, Relative Positioning (RP) and Temporal Shuffling (TS). We assess the accuracy of these models using the Physionet Challenge 2018 (PC18) dataset in the context of the downstream task sleep stage classification. The inclusion of FS resulted in a notable improvement in downstream task accuracy, with a 1.28 percent improvement on RP and a 2.02 percent improvement on TS. Furthermore, when visualizing the learned embeddings using Uniform Manifold Approximation and Projection (UMAP), distinct clusters emerge, indicating that the learned representations carry meaningful information.
3.7LGApr 7
EEG-MFTNet: An Enhanced EEGNet Architecture with Multi-Scale Temporal Convolutions and Transformer Fusion for Cross-Session Motor Imagery DecodingPanagiotis Andrikopoulos, Siamak Mehrkanoon
Brain-computer interfaces (BCIs) enable direct communication between the brain and external devices, providing critical support for individuals with motor impairments. However, accurate motor imagery (MI) decoding from electroencephalography (EEG) remains challenging due to noise and cross-session variability. This study introduces EEG-MFTNet, a novel deep learning model based on the EEGNet architecture, enhanced with multi-scale temporal convolutions and a Transformer encoder stream. These components are designed to capture both short and long-range temporal dependencies in EEG signals. The model is evaluated on the SHU dataset using a subject-dependent cross-session setup, outperforming baseline models, including EEGNet and its recent derivatives. EEG-MFTNet achieves an average classification accuracy of 58.9% while maintaining low computational complexity and inference latency. The results highlight the model's potential for real-time BCI applications and underscore the importance of architectural innovations in improving MI decoding. This work contributes to the development of more robust and adaptive BCI systems, with implications for assistive technologies and neurorehabilitation.
LGFeb 22, 2025
Integrating Weather Station Data and Radar for Precipitation Nowcasting: SmaAt-fUsion and SmaAt-Krige-GNetAleksej Cornelissen, Jie Shi, Siamak Mehrkanoon
In recent years, data-driven, deep learning-based approaches for precipitation nowcasting have attracted significant attention, showing promising results. However, many existing models fail to fully exploit the extensive atmospheric information available, relying primarily on precipitation data alone. This study introduces two novel deep learning architectures, SmaAt-fUsion and SmaAt-Krige-GNet, specifically designed to enhance precipitation nowcasting by integrating multi-variable weather station data with radar datasets. By leveraging additional meteorological information, these models improve representation learning in the latent space, resulting in enhanced nowcasting performance. The SmaAt-fUsion model extends the SmaAt-UNet framework by incorporating weather station data through a convolutional layer, integrating it into the bottleneck of the network. Conversely, the SmaAt-Krige-GNet model combines precipitation maps with weather station data processed using Kriging, a geo-statistical interpolation method, to generate variable-specific maps. These maps are then utilized in a dual-encoder architecture based on SmaAt-GNet, allowing multi-level data integration. Experimental evaluations were conducted using four years (2016--2019) of weather station and precipitation radar data from the Netherlands. Results demonstrate that SmaAt-Krige-GNet outperforms the standard SmaAt-UNet, which relies solely on precipitation radar data, in low precipitation scenarios, while SmaAt-fUsion surpasses SmaAt-UNet in both low and high precipitation scenarios. This highlights the potential of incorporating discrete weather station data to enhance the performance of deep learning-based weather nowcasting models.
LGMar 3
MAD-SmaAt-GNet: A Multimodal Advection-Guided Neural Network for Precipitation NowcastingSamuel van Wonderen, Siamak Mehrkanoon
Precipitation nowcasting (short-term forecasting) is still often performed using numerical solvers for physical equations, which are computationally expensive and make limited use of the large volumes of available weather data. Deep learning models have shown strong potential for precipitation nowcasting, offering both accuracy and computational efficiency. Among these models, convolutional neural networks (CNNs) are particularly effective for image-to-image prediction tasks. The SmaAt-UNet is a lightweight CNN based architecture that has demonstrated strong performance for precipitation nowcasting. This paper introduces the Multimodal Advection-Guided Small Attention GNet (MAD-SmaAt-GNet), which extends the core SmaAt-UNet by (i) incorporating an additional encoder to learn from multiple weather variables and (ii) integrating a physics-based advection component to ensure physically consistent predictions. We show that each extension individually improves rainfall forecasts and that their combination yields further gains. MAD-SmaAt-GNet reduces the mean squared error (MSE) by 8.9% compared with the baseline SmaAt-UNet for four-step precipitation forecasting up to four hours ahead. Additionally, experiments indicate that multimodal inputs are particularly beneficial for short lead times, while the advection-based component enhances performance across both short and long forecasting horizons.
LGAug 19, 2025
Trans-XFed: An Explainable Federated Learning for Supply Chain Credit AssessmentJie Shi, Arno P. J. M. Siebes, Siamak Mehrkanoon
This paper proposes a Trans-XFed architecture that combines federated learning with explainable AI techniques for supply chain credit assessment. The proposed model aims to address several key challenges, including privacy, information silos, class imbalance, non-identically and independently distributed (Non-IID) data, and model interpretability in supply chain credit assessment. We introduce a performance-based client selection strategy (PBCS) to tackle class imbalance and Non-IID problems. This strategy achieves faster convergence by selecting clients with higher local F1 scores. The FedProx architecture, enhanced with homomorphic encryption, is used as the core model, and further incorporates a transformer encoder. The transformer encoder block provides insights into the learned features. Additionally, we employ the integrated gradient explainable AI technique to offer insights into decision-making. We demonstrate the effectiveness of Trans-XFed through experimental evaluations on real-world supply chain datasets. The obtained results show its ability to deliver accurate credit assessments compared to several baselines, while maintaining transparency and privacy.
LGDec 20, 2024
Self-supervised Spatial-Temporal Learner for Precipitation NowcastingHaotian Li, Arno Siebes, Siamak Mehrkanoon
Nowcasting, the short-term prediction of weather, is essential for making timely and weather-dependent decisions. Specifically, precipitation nowcasting aims to predict precipitation at a local level within a 6-hour time frame. This task can be framed as a spatial-temporal sequence forecasting problem, where deep learning methods have been particularly effective. However, despite advancements in self-supervised learning, most successful methods for nowcasting remain fully supervised. Self-supervised learning is advantageous for pretraining models to learn representations without requiring extensive labeled data. In this work, we leverage the benefits of self-supervised learning and integrate it with spatial-temporal learning to develop a novel model, SpaT-SparK. SpaT-SparK comprises a CNN-based encoder-decoder structure pretrained with a masked image modeling (MIM) task and a translation network that captures temporal relationships among past and future precipitation maps in downstream tasks. We conducted experiments on the NL-50 dataset to evaluate the performance of SpaT-SparK. The results demonstrate that SpaT-SparK outperforms existing baseline supervised models, such as SmaAt-UNet, providing more accurate nowcasting predictions.
LGMar 13, 2024
Data-Efficient Sleep Staging with Synthetic Time Series PretrainingNiklas Grieger, Siamak Mehrkanoon, Stephan Bialonski
Analyzing electroencephalographic (EEG) time series can be challenging, especially with deep neural networks, due to the large variability among human subjects and often small datasets. To address these challenges, various strategies, such as self-supervised learning, have been suggested, but they typically rely on extensive empirical datasets. Inspired by recent advances in computer vision, we propose a pretraining task termed "frequency pretraining" to pretrain a neural network for sleep staging by predicting the frequency content of randomly generated synthetic time series. Our experiments demonstrate that our method surpasses fully supervised learning in scenarios with limited data and few subjects, and matches its performance in regimes with many subjects. Furthermore, our results underline the relevance of frequency information for sleep stage scoring, while also demonstrating that deep neural networks utilize information beyond frequencies to enhance sleep staging performance, which is consistent with previous research. We anticipate that our approach will be advantageous across a broad spectrum of applications where EEG data is limited or derived from a small number of subjects, including the domain of brain-computer interfaces.
LGJan 18, 2024
GA-SmaAt-GNet: Generative Adversarial Small Attention GNet for Extreme Precipitation NowcastingEloy Reulen, Siamak Mehrkanoon
In recent years, data-driven modeling approaches have gained significant attention across various meteorological applications, particularly in weather forecasting. However, these methods often face challenges in handling extreme weather conditions. In response, we present the GA-SmaAt-GNet model, a novel generative adversarial framework for extreme precipitation nowcasting. This model features a unique SmaAt-GNet generator, an extension of the successful SmaAt-UNet architecture, capable of integrating precipitation masks (binarized precipitation maps) to enhance predictive accuracy. Additionally, GA-SmaAt-GNet incorporates an attention-augmented discriminator inspired by the Pix2Pix architecture. This innovative framework paves the way for generative precipitation nowcasting using multiple data sources. We evaluate the performance of SmaAt-GNet and GA-SmaAt-GNet using real-life precipitation data from the Netherlands, revealing notable improvements in overall performance and for extreme precipitation events compared to other models. Specifically, our proposed architecture demonstrates its main performance gain in summer and autumn, when precipitation intensity is typically at its peak. Furthermore, we conduct uncertainty analysis on the GA-SmaAt-GNet model and the precipitation dataset, providing insights into its predictive capabilities. Finally, we employ Grad-CAM to offer visual explanations of our model's predictions, generating activation heatmaps that highlight areas of input activation throughout the network.
LGFeb 10, 2022
AA-TransUNet: Attention Augmented TransUNet For Nowcasting TasksYimin Yang, Siamak Mehrkanoon
Data driven modeling based approaches have recently gained a lot of attention in many challenging meteorological applications including weather element forecasting. This paper introduces a novel data-driven predictive model based on TransUNet for precipitation nowcasting task. The TransUNet model which combines the Transformer and U-Net models has been previously successfully applied in medical segmentation tasks. Here, TransUNet is used as a core model and is further equipped with Convolutional Block Attention Modules (CBAM) and Depthwise-separable Convolution (DSC). The proposed Attention Augmented TransUNet (AA-TransUNet) model is evaluated on two distinct datasets: the Dutch precipitation map dataset and the French cloud cover dataset. The obtained results show that the proposed model outperforms other examined models on both tested datasets. Furthermore, the uncertainty analysis of the proposed AA-TransUNet is provided to give additional insights on its predictions.
LGAug 16, 2021
Multistream Graph Attention Networks for Wind Speed ForecastingDogan Aykas, Siamak Mehrkanoon
Reliable and accurate wind speed prediction has significant impact in many industrial sectors such as economic, business and management among others. This paper presents a new model for wind speed prediction based on Graph Attention Networks (GAT). In particular, the proposed model extends GAT architecture by equipping it with a learnable adjacency matrix as well as incorporating a new attention mechanism with the aim of obtaining attention scores per weather variable. The output of the GAT based model is combined with the LSTM layer in order to exploit both the spatial and temporal characteristics of the multivariate multidimensional historical weather data. Real weather data collected from several cities in Denmark and Netherlands are used to conduct the experiments and evaluate the performance of the proposed model. We show that in comparison to previous architectures used for wind speed prediction, the proposed model is able to better learn the complex input-output relationships of the weather data. Furthermore, thanks to the learned attention weights, the model provides an additional insights on the most important weather variables and cities for the studied prediction task.
LGJun 28, 2021
TENT: Tensorized Encoder Transformer for Temperature ForecastingOnur Bilgin, Paweł Mąka, Thomas Vergutz et al.
Reliable weather forecasting is of great importance in science, business, and society. The best performing data-driven models for weather prediction tasks rely on recurrent or convolutional neural networks, where some of which incorporate attention mechanisms. In this work, we introduce a novel model based on Transformer architecture for weather forecasting. The proposed Tensorial Encoder Transformer (TENT) model is equipped with tensorial attention and thus it exploits the spatiotemporal structure of weather data by processing it in multidimensional tensorial format. We show that compared to the classical encoder transformer, 3D convolutional neural networks, LSTM, and Convolutional LSTM, the proposed TENT model can better learn the underlying complex pattern of the weather data for the studied temperature prediction task. Experiments on two real-life weather datasets are performed. The datasets consist of historical measurements from weather stations in the USA, Canada and Europe. The first dataset contains hourly measurements of weather attributes for 30 cities in the USA and Canada from October 2012 to November 2017. The second dataset contains daily measurements of weather attributes of 18 cities across Europe from May 2005 to April 2020. Two attention scores are introduced based on the obtained tonsorial attention and are visualized in order to shed light on the decision-making process of our model and provide insight knowledge on the most important cities for the target cities.
LGFeb 21, 2021
Symbolic regression for scientific discovery: an application to wind speed forecastingIsmail Alaoui Abdellaoui, Siamak Mehrkanoon
Symbolic regression corresponds to an ensemble of techniques that allow to uncover an analytical equation from data. Through a closed form formula, these techniques provide great advantages such as potential scientific discovery of new laws, as well as explainability, feature engineering as well as fast inference. Similarly, deep learning based techniques has shown an extraordinary ability of modeling complex patterns. The present paper aims at applying a recent end-to-end symbolic regression technique, i.e. the equation learner (EQL), to get an analytical equation for wind speed forecasting. We show that it is possible to derive an analytical equation that can achieve reasonable accuracy for short term horizons predictions only using few number of features.
LGFeb 12, 2021
Broad-UNet: Multi-scale feature learning for nowcasting tasksJesus Garcia Fernandez, Siamak Mehrkanoon
Weather nowcasting consists of predicting meteorological components in the short term at high spatial resolutions. Due to its influence in many human activities, accurate nowcasting has recently gained plenty of attention. In this paper, we treat the nowcasting problem as an image-to-image translation problem using satellite imagery. We introduce Broad-UNet, a novel architecture based on the core UNet model, to efficiently address this problem. In particular, the proposed Broad-UNet is equipped with asymmetric parallel convolutions as well as Atrous Spatial Pyramid Pooling (ASPP) module. In this way, The the Broad-UNet model learns more complex patterns by combining multi-scale features while using fewer parameters than the core UNet model. The proposed model is applied on two different nowcasting tasks, i.e. precipitation maps and cloud cover nowcasting. The obtained numerical results show that the introduced Broad-UNet model performs more accurate predictions compared to the other examined architectures.
LGJan 25, 2021
Deep Graph Convolutional Networks for Wind Speed PredictionTomasz Stańczyk, Siamak Mehrkanoon
Wind speed prediction and forecasting is important for various business and management sectors. In this paper, we introduce new models for wind speed prediction based on graph convolutional networks (GCNs). Given hourly data of several weather variables acquired from multiple weather stations, wind speed values are predicted for multiple time steps ahead. In particular, the weather stations are treated as nodes of a graph whose associated adjacency matrix is learnable. In this way, the network learns the graph spatial structure and determines the strength of relations between the weather stations based on the historical weather data. We add a self-loop connection to the learnt adjacency matrix and normalize the adjacency matrix. We examine two scenarios with the self-loop connection setting (two separate models). In the first scenario, the self-loop connection is imposed as a constant additive. In the second scenario a learnable parameter is included to enable the network to decide about the self-loop connection strength. Furthermore, we incorporate data from multiple time steps with temporal convolution, which together with spatial graph convolution constitutes spatio-temporal graph convolution. We perform experiments on real datasets collected from weather stations located in cities in Denmark and the Netherlands. The numerical experiments show that our proposed models outperform previously developed baseline models on the referenced datasets. We provide additional insights by visualizing learnt adjacency matrices from each layer of our models.
LGNov 6, 2020
Deep coastal sea elements forecasting using U-Net based modelsJesús García Fernández, Ismail Alaoui Abdellaoui, Siamak Mehrkanoon
The supply and demand of energy is influenced by meteorological conditions. The relevance of accurate weather forecasts increases as the demand for renewable energy sources increases. The energy providers and policy makers require weather information to make informed choices and establish optimal plans according to the operational objectives. Due to the recent development of deep learning techniques applied to satellite imagery, weather forecasting that uses remote sensing data has also been the subject of major progress. The present paper investigates multiple steps ahead frame prediction for coastal sea elements in the Netherlands using U-Net based architectures. Hourly data from the Copernicus observation programme spanned over a period of 2 years has been used to train the models and make the forecasting, including seasonal predictions. We propose a variation of the U-Net architecture and further extend this novel model using residual connections, parallel convolutions and asymmetric convolutions in order to introduce three additional architectures. In particular, we show that the architecture equipped with parallel and asymmetric convolutions as well as skip connections outperforms the other three discussed models.
LGSep 23, 2020
Deep multi-stations weather forecasting: explainable recurrent convolutional neural networksIsmail Alaoui Abdellaoui, Siamak Mehrkanoon
Deep learning applied to weather forecasting has started gaining popularity because of the progress achieved by data-driven models. The present paper compares two different deep learning architectures to perform weather prediction on daily data gathered from 18 cities across Europe and spanned over a period of 15 years. We propose the Deep Attention Unistream Multistream (DAUM) networks that investigate different types of input representations (i.e. tensorial unistream vs. multistream ) as well as the incorporation of the attention mechanism. In particular, we show that adding a self-attention block within the models increases the overall forecasting performance. Furthermore, visualization techniques such as occlusion analysis and score maximization are used to give an additional insight on the most important features and cities for predicting a particular target feature of target cities.
LGJul 13, 2020
Deep Neural-Kernel MachinesSiamak Mehrkanoon
In this chapter we review the main literature related to the recent advancement of deep neural-kernel architecture, an approach that seek the synergy between two powerful class of models, i.e. kernel-based models and artificial neural networks. The introduced deep neural-kernel framework is composed of a hybridization of the neural networks architecture and a kernel machine. More precisely, for the kernel counterpart the model is based on Least Squares Support Vector Machines with explicit feature mapping. Here we discuss the use of one form of an explicit feature map obtained by random Fourier features. Thanks to this explicit feature map, in one hand bridging the two architectures has become more straightforward and on the other hand one can find the solution of the associated optimization problem in the primal, therefore making the model scalable to large scale datasets. We begin by introducing a neural-kernel architecture that serves as the core module for deeper models equipped with different pooling layers. In particular, we review three neural-kernel machines with average, maxout and convolutional pooling layers. In average pooling layer the outputs of the previous representation layers are averaged. The maxout layer triggers competition among different input representations and allows the formation of multiple sub-networks within the same model. The convolutional pooling layer reduces the dimensionality of the multi-scale output representations. Comparison with neural-kernel model, kernel based models and the classical neural networks architecture have been made and the numerical experiments illustrate the effectiveness of the introduced models on several benchmark datasets.
LGJul 8, 2020
SmaAt-UNet: Precipitation Nowcasting using a Small Attention-UNet ArchitectureKevin Trebing, Tomasz Stanczyk, Siamak Mehrkanoon
Weather forecasting is dominated by numerical weather prediction that tries to model accurately the physical properties of the atmosphere. A downside of numerical weather prediction is that it is lacking the ability for short-term forecasts using the latest available information. By using a data-driven neural network approach we show that it is possible to produce an accurate precipitation nowcast. To this end, we propose SmaAt-UNet, an efficient convolutional neural networks-based on the well known UNet architecture equipped with attention modules and depthwise-separable convolutions. We evaluate our approaches on a real-life datasets using precipitation maps from the region of the Netherlands and binary images of cloud coverage of France. The experimental results show that in terms of prediction performance, the proposed model is comparable to other examined models while only using a quarter of the trainable parameters.
LGJul 4, 2020
Wind speed prediction using multidimensional convolutional neural networksKevin Trebing, Siamak Mehrkanoon
Accurate wind speed forecasting is of great importance for many economic, business and management sectors. This paper introduces a new model based on convolutional neural networks (CNNs) for wind speed prediction tasks. In particular, we show that compared to classical CNN-based models, the proposed model is able to better characterise the spatio-temporal evolution of the wind data by learning the underlying complex input-output relationships from multiple dimensions (views) of the input data. The proposed model exploits the spatio-temporal multivariate multidimensional historical weather data for learning new representations used for wind forecasting. We conduct experiments on two real-life weather datasets. The datasets are measurements from cities in Denmark and in the Netherlands. The proposed model is compared with traditional 2- and 3-dimensional CNN models, a 2D-CNN model with an attention layer and a 2D-CNN model equipped with upscaling and depthwise separable convolutions.
LGJul 2, 2020
Deep brain state classification of MEG dataIsmail Alaoui Abdellaoui, Jesus Garcia Fernandez, Caner Sahinli et al.
Neuroimaging techniques have shown to be useful when studying the brain's activity. This paper uses Magnetoencephalography (MEG) data, provided by the Human Connectome Project (HCP), in combination with various deep artificial neural network models to perform brain decoding. More specifically, here we investigate to which extent can we infer the task performed by a subject based on its MEG data. Three models based on compact convolution, combined convolutional and long short-term architecture as well as a model based on multi-view learning that aims at fusing the outputs of the two stream networks are proposed and examined. These models exploit the spatio-temporal MEG data for learning new representations that are used to decode the relevant tasks across subjects. In order to realize the most relevant features of the input signals, two attention mechanisms, i.e. self and global attention, are incorporated in all the models. The experimental results of cross subject multi-class classification on the studied MEG dataset show that the inclusion of attention improves the generalization of the models across subjects.
MLMar 7, 2015
Higher order Matching Pursuit for Low Rank Tensor LearningYuning Yang, Siamak Mehrkanoon, Johan A. K. Suykens
Low rank tensor learning, such as tensor completion and multilinear multitask learning, has received much attention in recent years. In this paper, we propose higher order matching pursuit for low rank tensor learning problems with a convex or a nonconvex cost function, which is a generalization of the matching pursuit type methods. At each iteration, the main cost of the proposed methods is only to compute a rank-one tensor, which can be done efficiently, making the proposed methods scalable to large scale problems. Moreover, storing the resulting rank-one tensors is of low storage requirement, which can help to break the curse of dimensionality. The linear convergence rate of the proposed methods is established in various circumstances. Along with the main methods, we also provide a method of low computational complexity for approximately computing the rank-one tensors, with provable approximation ratio, which helps to improve the efficiency of the main methods and to analyze the convergence rate. Experimental results on synthetic as well as real datasets verify the efficiency and effectiveness of the proposed methods.