Omar Faruque

LG
h-index24
11papers
44citations
Novelty49%
AI Score47

11 Papers

CVAug 26, 2024Code
gWaveNet: Classification of Gravity Waves from Noisy Satellite Data using Custom Kernel Integrated Deep Learning Method

Seraj Al Mahmud Mostafa, Omar Faruque, Chenxi Wang et al.

Atmospheric gravity waves occur in the Earths atmosphere caused by an interplay between gravity and buoyancy forces. These waves have profound impacts on various aspects of the atmosphere, including the patterns of precipitation, cloud formation, ozone distribution, aerosols, and pollutant dispersion. Therefore, understanding gravity waves is essential to comprehend and monitor changes in a wide range of atmospheric behaviors. Limited studies have been conducted to identify gravity waves from satellite data using machine learning techniques. Particularly, without applying noise removal techniques, it remains an underexplored area of research. This study presents a novel kernel design aimed at identifying gravity waves within satellite images. The proposed kernel is seamlessly integrated into a deep convolutional neural network, denoted as gWaveNet. Our proposed model exhibits impressive proficiency in detecting images containing gravity waves from noisy satellite data without any feature engineering. The empirical results show our model outperforms related approaches by achieving over 98% training accuracy and over 94% test accuracy which is known to be the best result for gravity waves detection up to the time of this work. We open sourced our code at https://rb.gy/qn68ku.

LGSep 19, 2024
Hybrid Ensemble Deep Graph Temporal Clustering for Spatiotemporal Data

Francis Ndikum Nji, Omar Faruque, Mostafa Cham et al.

Classifying subsets based on spatial and temporal features is crucial to the analysis of spatiotemporal data given the inherent spatial and temporal variability. Since no single clustering algorithm ensures optimal results, researchers have increasingly explored the effectiveness of ensemble approaches. Ensemble clustering has attracted much attention due to increased diversity, better generalization, and overall improved clustering performance. While ensemble clustering may yield promising results on simple datasets, it has not been fully explored on complex multivariate spatiotemporal data. For our contribution to this field, we propose a novel hybrid ensemble deep graph temporal clustering (HEDGTC) method for multivariate spatiotemporal data. HEDGTC integrates homogeneous and heterogeneous ensemble methods and adopts a dual consensus approach to address noise and misclassification from traditional clustering. It further applies a graph attention autoencoder network to improve clustering performance and stability. When evaluated on three real-world multivariate spatiotemporal data, HEDGTC outperforms state-of-the-art ensemble clustering models by showing improved performance and stability with consistent results. This indicates that HEDGTC can effectively capture implicit temporal patterns in complex spatiotemporal data.

LGApr 27, 2023
Deep Spatiotemporal Clustering: A Temporal Clustering Approach for Multi-dimensional Climate Data

Omar Faruque, Francis Ndikum Nji, Mostafa Cham et al.

Clustering high-dimensional spatiotemporal data using an unsupervised approach is a challenging problem for many data-driven applications. Existing state-of-the-art methods for unsupervised clustering use different similarity and distance functions but focus on either spatial or temporal features of the data. Concentrating on joint deep representation learning of spatial and temporal features, we propose Deep Spatiotemporal Clustering (DSC), a novel algorithm for the temporal clustering of high-dimensional spatiotemporal data using an unsupervised deep learning method. Inspired by the U-net architecture, DSC utilizes an autoencoder integrating CNN-RNN layers to learn latent representations of the spatiotemporal data. DSC also includes a unique layer for cluster assignment on latent representations that uses the Student's t-distribution. By optimizing the clustering loss and data reconstruction loss simultaneously, the algorithm gradually improves clustering assignments and the nonlinear mapping between low-dimensional latent feature space and high-dimensional original data space. A multivariate spatiotemporal climate dataset is used to evaluate the efficacy of the proposed method. Our extensive experiments show our approach outperforms both conventional and deep learning-based unsupervised clustering algorithms. Additionally, we compared the proposed model with its various variants (CNN encoder, CNN autoencoder, CNN-RNN encoder, CNN-RNN autoencoder, etc.) to get insight into using both the CNN and RNN layers in the autoencoder, and our proposed technique outperforms these variants in terms of clustering results.

AIFeb 22, 2023
Quantifying Causes of Arctic Amplification via Deep Learning based Time-series Causal Inference

Sahara Ali, Omar Faruque, Yiyi Huang et al.

The warming of the Arctic, also known as Arctic amplification, is led by several atmospheric and oceanic drivers. However, the details of its underlying thermodynamic causes are still unknown. Inferring the causal effects of atmospheric processes on sea ice melt using fixed treatment effect strategies leads to unrealistic counterfactual estimations. Such models are also prone to bias due to time-varying confoundedness. Further, the complex non-linearity in Earth science data makes it infeasible to perform causal inference using existing marginal structural techniques. In order to tackle these challenges, we propose TCINet - time-series causal inference model to infer causation under continuous treatment using recurrent neural networks and a novel probabilistic balancing technique. Through experiments on synthetic and observational data, we show how our research can substantially improve the ability to quantify leading causes of Arctic sea ice melt, further paving paths for causal inference in observational Earth science.

DATA-ANApr 3, 2024Code
Causality for Earth Science -- A Review on Time-series and Spatiotemporal Causality Methods

Sahara Ali, Uzma Hasan, Xingyan Li et al.

This survey paper covers the breadth and depth of time-series and spatiotemporal causality methods, and their applications in Earth Science. More specifically, the paper presents an overview of causal discovery and causal inference, explains the underlying causal assumptions, and enlists evaluation techniques and key terminologies of the domain area. The paper elicits the various state-of-the-art methods introduced for time-series and spatiotemporal causal analysis along with their strengths and limitations. The paper further describes the existing applications of several methods for answering specific Earth Science questions such as extreme weather events, sea level rise, teleconnections etc. This survey paper can serve as a primer for Data Science researchers interested in data-driven causal study as we share a list of resources, such as Earth Science datasets (synthetic, simulated and observational data) and open source tools for causal analysis. It will equally benefit the Earth Science community interested in taking an AI-driven approach to study the causality of different dynamic and thermodynamic processes as we present the open challenges and opportunities in performing causality-based Earth Science study.

4.1CLApr 29
HealthNLP_Retrievers at ArchEHR-QA 2026: Cascaded LLM Pipeline for Grounded Clinical Question Answering

Md Biplob Hosen, Md Alomgeer Hussein, Md Akmol Masud et al.

Patient portals now give individuals direct access to their electronic health records (EHRs), yet access alone does not ensure patients understand or act on the complex clinical information contained in these records. The ArchEHR-QA 2026 shared task addresses this challenge by focusing on grounded question answering over EHRs, and this paper presents the system developed by the HealthNLP_Retrievers team for this task. The proposed approach uses a multi-stage cascaded pipeline powered by the Gemini 2.5 Pro large language model to interpret patient-authored questions and retrieve relevant evidence from lengthy clinical notes. Our architecture comprises four integrated modules: (1) a few-shot query reformulation unit which summarizes verbose patient queries; (2) a heuristic-based evidence scorer which ranks clinical sentences to prioritize recall; (3) a grounded response generator which synthesizes professional-caliber answers restricted strictly to identified evidence; and (4) a high-precision many-to-many alignment framework which links generated answers to supporting clinical sentences. This cascaded approach achieved competitive results. Across the individual tracks, the system ranked 1st in question interpretation, 5th in answer generation, 7th in evidence identification, and 9th in answer-evidence alignment. These results show that integrating large language models within a structured multi-stage pipeline improves grounding, precision, and the professional quality of patient-oriented health communication. To support reproducibility, our source code is publicly available in our GitHub repository

49.3LGApr 27
TTCD:Transformer Integrated Temporal Causal Discovery from Non-Stationary Time Series Data

Omar Faruque, Sahara Ali, Xue Zheng et al.

The widespread availability of complex time series data in various domains such as environmental science, epidemiology, and economics demands robust causal discovery methods that can identify intricate contemporaneous and lagged relationships in non-stationary, nonlinear, and noisy settings. Existing constraint-based methods often rely heavily on conditional independence tests that degrade for limited data samples and complex distributions, while score-based methods impose strong statistical assumptions. Recent methods address special cases such as change point detection or distribution shifts, but struggle to provide a unified solution. We propose the Transformer Integrated Temporal Causal Discovery (TTCD) Framework, a novel end-to-end approach that learns contemporaneous and lagged causal relations from non-stationary time series. TTCD introduces a Non-Stationary Feature Learner integrating temporal and frequency-domain attention with dynamic non-stationarity profiling, and a custom Causal Structure Learner. A key innovation is reconstruction-guided causal signal distillation, to distill essential causal signals through the reconstruction process of the transformer decoder, which mitigates noise and spurious correlations while preserving meaningful dependencies. The Causal Structure Learner operates on distilled reconstructed signals to infer the underlying causal graph without restrictive assumptions on noise distributions or data generation processes. Experiments on synthetic, benchmark, and real world datasets show that TTCD consistently outperforms state-of-the-art baselines in both accuracy and consistency with domain knowledge, demonstrating the approach's effectiveness for causal discovery in challenging real world contexts.

CVMay 29, 2025
DeepTopoNet: A Framework for Subglacial Topography Estimation on the Greenland Ice Sheets

Bayu Adhi Tama, Mansa Krishna, Homayra Alam et al.

Understanding Greenland's subglacial topography is critical for projecting the future mass loss of the ice sheet and its contribution to global sea-level rise. However, the complex and sparse nature of observational data, particularly information about the bed topography under the ice sheet, significantly increases the uncertainty in model projections. Bed topography is traditionally measured by airborne ice-penetrating radar that measures the ice thickness directly underneath the aircraft, leaving data gap of tens of kilometers in between flight lines. This study introduces a deep learning framework, which we call as DeepTopoNet, that integrates radar-derived ice thickness observations and BedMachine Greenland data through a novel dynamic loss-balancing mechanism. Among all efforts to reconstruct bed topography, BedMachine has emerged as one of the most widely used datasets, combining mass conservation principles and ice thickness measurements to generate high-resolution bed elevation estimates. The proposed loss function adaptively adjusts the weighting between radar and BedMachine data, ensuring robustness in areas with limited radar coverage while leveraging the high spatial resolution of BedMachine predictions i.e. bed estimates. Our approach incorporates gradient-based and trend surface features to enhance model performance and utilizes a CNN architecture designed for subgrid-scale predictions. By systematically testing on the Upernavik Isstrøm) region, the model achieves high accuracy, outperforming baseline methods in reconstructing subglacial terrain. This work demonstrates the potential of deep learning in bridging observational gaps, providing a scalable and efficient solution to inferring subglacial topography.

LGMay 13, 2024
Estimating Direct and Indirect Causal Effects of Spatiotemporal Interventions in Presence of Spatial Interference

Sahara Ali, Omar Faruque, Jianwu Wang

Spatial interference (SI) occurs when the treatment at one location affects the outcomes at other locations. Accounting for spatial interference in spatiotemporal settings poses further challenges as interference violates the stable unit treatment value assumption, making it infeasible for standard causal inference methods to quantify the effects of time-varying treatment at spatially varying outcomes. In this paper, we first formalize the concept of spatial interference in case of time-varying treatment assignments by extending the potential outcome framework under the assumption of no unmeasured confounding. We then propose our deep learning based potential outcome model for spatiotemporal causal inference. We utilize latent factor modeling to reduce the bias due to time-varying confounding while leveraging the power of U-Net architecture to capture global and local spatial interference in data over time. Our causal estimators are an extension of average treatment effect (ATE) for estimating direct (DATE) and indirect effects (IATE) of spatial interference on treated and untreated data. Being the first of its kind deep learning based spatiotemporal causal inference technique, our approach shows advantages over several baseline methods based on the experiment results on two synthetic datasets, with and without spatial interference. Our results on real-world climate dataset also align with domain knowledge, further demonstrating the effectiveness of our proposed method.

LGApr 1, 2024
TS-CausalNN: Learning Temporal Causal Relations from Non-linear Non-stationary Time Series Data

Omar Faruque, Sahara Ali, Xue Zheng et al.

The growing availability and importance of time series data across various domains, including environmental science, epidemiology, and economics, has led to an increasing need for time-series causal discovery methods that can identify the intricate relationships in the non-stationary, non-linear, and often noisy real world data. However, the majority of current time series causal discovery methods assume stationarity and linear relations in data, making them infeasible for the task. Further, the recent deep learning-based methods rely on the traditional causal structure learning approaches making them computationally expensive. In this paper, we propose a Time-Series Causal Neural Network (TS-CausalNN) - a deep learning technique to discover contemporaneous and lagged causal relations simultaneously. Our proposed architecture comprises (i) convolutional blocks comprising parallel custom causal layers, (ii) acyclicity constraint, and (iii) optimization techniques using the augmented Lagrangian approach. In addition to the simple parallel design, an advantage of the proposed model is that it naturally handles the non-stationarity and non-linearity of the data. Through experiments on multiple synthetic and real world datasets, we demonstrate the empirical proficiency of our proposed approach as compared to several state-of-the-art methods. The inferred graphs for the real world dataset are in good agreement with the domain understanding.

CVSep 10, 2025
Improving Greenland Bed Topography Mapping with Uncertainty-Aware Graph Learning on Sparse Radar Data

Bayu Adhi Tama, Homayra Alam, Mostafa Cham et al.

Accurate maps of Greenland's subglacial bed are essential for sea-level projections, but radar observations are sparse and uneven. We introduce GraphTopoNet, a graph-learning framework that fuses heterogeneous supervision and explicitly models uncertainty via Monte Carlo dropout. Spatial graphs built from surface observables (elevation, velocity, mass balance) are augmented with gradient features and polynomial trends to capture both local variability and broad structure. To handle data gaps, we employ a hybrid loss that combines confidence-weighted radar supervision with dynamically balanced regularization. Applied to three Greenland subregions, GraphTopoNet outperforms interpolation, convolutional, and graph-based baselines, reducing error by up to 60 percent while preserving fine-scale glacial features. The resulting bed maps improve reliability for operational modeling, supporting agencies engaged in climate forecasting and policy. More broadly, GraphTopoNet shows how graph machine learning can convert sparse, uncertain geophysical observations into actionable knowledge at continental scale.