LGApr 17Code
Dynamic Sheaf Diffusion Networks with Adaptive Local Structure for Heterogeneous Spatio-Temporal Graph LearningAbeer Mostafa, Raneen Younis, Zahra Ahmadi
Spatio-temporal processes often exhibit highly heterogeneous and non-intuitive responses to localized disruptions, limiting the effectiveness of conventional message passing approaches in modeling local heterogeneity. We reformulate spatio-temporal forecasting as the problem of learning information flow over locally structured spaces, rather than propagating globally aligned node representations. To this end, we introduce a spatio-temporal sheaf diffusion graph neural network (ST-Sheaf GNN) that embeds graph topology into sheaf-based vector spaces connected by learned linear restriction maps. Unlike prior approaches relying on static or globally shared transformations, our model learns dynamic restriction maps that evolve over time and adapt to local spatio-temporal patterns, enabling more expressive interactions. The proposed framework both theoretically guarantees and empirically demonstrates evidence that the proposed diffusion mechanism mitigates oversmoothing, preserving discriminative node representations even with increasing diffusion layer depth. Experiments on diverse real-world spatio-temporal forecasting benchmarks across multiple domains demonstrate state-of-the-art performance, highlighting the effectiveness of sheaf topological representations as a principled foundation for spatio-temporal graph learning. The code is available at: https://anonymous.4open.science/r/ST-SheafGNN-6523/.
LGJun 6, 2023
MTS2Graph: Interpretable Multivariate Time Series Classification with Temporal Evolving GraphsRaneen Younis, Abdul Hakmeh, Zahra Ahmadi
Conventional time series classification approaches based on bags of patterns or shapelets face significant challenges in dealing with a vast amount of feature candidates from high-dimensional multivariate data. In contrast, deep neural networks can learn low-dimensional features efficiently, and in particular, Convolutional Neural Networks (CNN) have shown promising results in classifying Multivariate Time Series (MTS) data. A key factor in the success of deep neural networks is this astonishing expressive power. However, this power comes at the cost of complex, black-boxed models, conflicting with the goals of building reliable and human-understandable models. An essential criterion in understanding such predictive deep models involves quantifying the contribution of time-varying input variables to the classification. Hence, in this work, we introduce a new framework for interpreting multivariate time series data by extracting and clustering the input representative patterns that highly activate CNN neurons. This way, we identify each signal's role and dependencies, considering all possible combinations of signals in the MTS input. Then, we construct a graph that captures the temporal relationship between the extracted patterns for each layer. An effective graph merging strategy finds the connection of each node to the previous layer's nodes. Finally, a graph embedding algorithm generates new representations of the created interpretable time-series features. To evaluate the performance of our proposed framework, we run extensive experiments on eight datasets of the UCR/UEA archive, along with HAR and PAM datasets. The experiments indicate the benefit of our time-aware graph-based representation in MTS classification while enriching them with more interpretability.
CVMay 12
Cross-Modal-Domain Generalization Through Semantically Aligned Discrete RepresentationsSouptik Sen, Raneen Younis, Zahra Ahmadi
Multimodal learning seeks to integrate information across diverse sensory sources, yet current approaches struggle to balance cross-modal generalizability with modality-specific structure. Continuous (implicit) methods preserve fine-grained priors but render generalization challenging, while discrete (explicit) approaches enforce shared prototypes at the expense of modality specificity. We introduce CoDAAR (Cross-modal Discrete Alignment And Reconstruction), a novel framework that resolves this long-standing trade-off by establishing semantic consensus across modality-specific codebooks through index-level alignment. This design uniquely allows CoDAAR to preserve modality-unique structures while achieving generalizable cross-modal representations within a unified discrete space. CoDAAR combines two complementary mechanisms: Discrete Temporal Alignment (DTA), which enables fine-grained temporal quantization, and Cascading Semantic Alignment (CSA), which promotes progressive cross-modal semantic agreement. Together, they establish a competition-free unified representation space. Trained with self-supervised reconstruction objectives on paired multimodal sequences, CoDAAR demonstrates robust cross-modal and cross-domain generalization. Across Cross-Modal Generalization benchmarks, including event classification, localization, video segmentation, and cross-dataset transfer, CoDAAR achieves state-of-the-art performance, establishing a new paradigm for discrete and generalizable multimodal representation learning.
LGFeb 16
Orthogonalized Multimodal Contrastive Learning with Asymmetric Masking for Structured RepresentationsCarolin Cissee, Raneen Younis, Zahra Ahmadi
Multimodal learning seeks to integrate information from heterogeneous sources, where signals may be shared across modalities, specific to individual modalities, or emerge only through their interaction. While self-supervised multimodal contrastive learning has achieved remarkable progress, most existing methods predominantly capture redundant cross-modal signals, often neglecting modality-specific (unique) and interaction-driven (synergistic) information. Recent extensions broaden this perspective, yet they either fail to explicitly model synergistic interactions or learn different information components in an entangled manner, leading to incomplete representations and potential information leakage. We introduce \textbf{COrAL}, a principled framework that explicitly and simultaneously preserves redundant, unique, and synergistic information within multimodal representations. COrAL employs a dual-path architecture with orthogonality constraints to disentangle shared and modality-specific features, ensuring a clean separation of information components. To promote synergy modeling, we introduce asymmetric masking with complementary view-specific patterns, compelling the model to infer cross-modal dependencies rather than rely solely on redundant cues. Extensive experiments on synthetic benchmarks and diverse MultiBench datasets demonstrate that COrAL consistently matches or outperforms state-of-the-art methods while exhibiting low performance variance across runs. These results indicate that explicitly modeling the full spectrum of multimodal information yields more stable, reliable, and comprehensive embeddings.
CVNov 10, 2025
Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation LearningRaneen Younis, Louay Hamdi, Lukas Chavez et al.
Whole-slide images are central to digital pathology, yet their extreme size and scarce annotations make self-supervised learning essential. Masked Autoencoders (MAEs) with Vision Transformer backbones have recently shown strong potential for histopathology representation learning. However, conventional random patch sampling during MAE pretraining often includes irrelevant or noisy regions, limiting the model's ability to capture meaningful tissue patterns. In this paper, we present a lightweight and domain-adapted framework that brings structure and biological relevance into MAE-based learning through a wavelet-informed patch selection strategy. WISE-MAE applies a two-step coarse-to-fine process: wavelet-based screening at low magnification to locate structurally rich regions, followed by high-resolution extraction for detailed modeling. This approach mirrors the diagnostic workflow of pathologists and improves the quality of learned representations. Evaluations across multiple cancer datasets, including lung, renal, and colorectal tissues, show that WISE-MAE achieves competitive representation quality and downstream classification performance while maintaining efficiency under weak supervision.