LGSep 6, 2023
Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyondZhiqi Shao, Dai Shi, Andi Han et al. · tsinghua
Graph Neural Networks (GNNs) have emerged as one of the leading approaches for machine learning on graph-structured data. Despite their great success, critical computational challenges such as over-smoothing, over-squashing, and limited expressive power continue to impact the performance of GNNs. In this study, inspired from the time-reversal principle commonly utilized in classical and quantum physics, we reverse the time direction of the graph heat equation. The resulted reversing process yields a class of high pass filtering functions that enhance the sharpness of graph node features. Leveraging this concept, we introduce the Multi-Scaled Heat Kernel based GNN (MHKG) by amalgamating diverse filtering functions' effects on node features. To explore more flexible filtering conditions, we further generalize MHKG into a model termed G-MHKG and thoroughly show the roles of each element in controlling over-smoothing, over-squashing and expressive power. Notably, we illustrate that all aforementioned issues can be characterized and analyzed via the properties of the filtering functions, and uncover a trade-off between over-smoothing and over-squashing: enhancing node feature sharpness will make model suffer more from over-squashing, and vice versa. Furthermore, we manipulate the time again to show how G-MHKG can handle both two issues under mild conditions. Our conclusive experiments highlight the effectiveness of proposed models. It surpasses several GNN baseline models in performance across graph datasets characterized by both homophily and heterophily.
LGOct 8, 2022
Generalized energy and gradient flow via graph frameletsAndi Han, Dai Shi, Zhiqi Shao et al.
In this work, we provide a theoretical understanding of the framelet-based graph neural networks through the perspective of energy gradient flow. By viewing the framelet-based models as discretized gradient flows of some energy, we show it can induce both low-frequency and high-frequency-dominated dynamics, via the separate weight matrices for different frequency components. This substantiates its good empirical performance on both homophilic and heterophilic graphs. We then propose a generalized energy via framelet decomposition and show its gradient flow leads to a novel graph neural network, which includes many existing models as special cases. We then explain how the proposed model generally leads to more flexible dynamics, thus potentially enhancing the representation power of graph neural networks.
LGJul 19, 2023
How Curvature Enhance the Adaptation Power of Framelet GCNsDai Shi, Yi Guo, Zhiqi Shao et al.
Graph neural network (GNN) has been demonstrated powerful in modeling graph-structured data. However, despite many successful cases of applying GNNs to various graph classification and prediction tasks, whether the graph geometrical information has been fully exploited to enhance the learning performance of GNNs is not yet well understood. This paper introduces a new approach to enhance GNN by discrete graph Ricci curvature. Specifically, the graph Ricci curvature defined on the edges of a graph measures how difficult the information transits on one edge from one node to another based on their neighborhoods. Motivated by the geometric analogy of Ricci curvature in the graph setting, we prove that by inserting the curvature information with different carefully designed transformation function $ζ$, several known computational issues in GNN such as over-smoothing can be alleviated in our proposed model. Furthermore, we verified that edges with very positive Ricci curvature (i.e., $κ_{i,j} \approx 1$) are preferred to be dropped to enhance model's adaption to heterophily graph and one curvature based graph edge drop algorithm is proposed. Comprehensive experiments show that our curvature-based GNN model outperforms the state-of-the-art baselines in both homophily and heterophily graph datasets, indicating the effectiveness of involving graph geometric information in GNNs.
LGJul 13, 2023
Frameless Graph Knowledge DistillationDai Shi, Zhiqi Shao, Yi Guo et al.
Knowledge distillation (KD) has shown great potential for transferring knowledge from a complex teacher model to a simple student model in which the heavy learning task can be accomplished efficiently and without losing too much prediction accuracy. Recently, many attempts have been made by applying the KD mechanism to the graph representation learning models such as graph neural networks (GNNs) to accelerate the model's inference speed via student models. However, many existing KD-based GNNs utilize MLP as a universal approximator in the student model to imitate the teacher model's process without considering the graph knowledge from the teacher model. In this work, we provide a KD-based framework on multi-scaled GNNs, known as graph framelet, and prove that by adequately utilizing the graph knowledge in a multi-scaled manner provided by graph framelet decomposition, the student model is capable of adapting both homophilic and heterophilic graphs and has the potential of alleviating the over-squashing issue with a simple yet effectively graph surgery. Furthermore, we show how the graph knowledge supplied by the teacher is learned and digested by the student model via both algebra and geometry. Comprehensive experiments show that our proposed model can generate learning accuracy identical to or even surpass the teacher model while maintaining the high speed of inference.
LGOct 27, 2022
Generalized Laplacian Regularized Framelet Graph Neural NetworksZhiqi Shao, Andi Han, Dai Shi et al.
This paper introduces a novel Framelet Graph approach based on p-Laplacian GNN. The proposed two models, named p-Laplacian undecimated framelet graph convolution (pL-UFG) and generalized p-Laplacian undecimated framelet graph convolution (pL-fUFG) inherit the nature of p-Laplacian with the expressive power of multi-resolution decomposition of graph signals. The empirical study highlights the excellent performance of the pL-UFG and pL-fUFG in different graph learning tasks including node classification and signal denoising.
LGSep 8, 2024
STLLM-DF: A Spatial-Temporal Large Language Model with Diffusion for Enhanced Multi-Mode Traffic System ForecastingZhiqi Shao, Haoning Xi, Haohui Lu et al.
The rapid advancement of Intelligent Transportation Systems (ITS) presents challenges, particularly with missing data in multi-modal transportation and the complexity of handling diverse sequential tasks within a centralized framework. To address these issues, we propose the Spatial-Temporal Large Language Model Diffusion (STLLM-DF), an innovative model that leverages Denoising Diffusion Probabilistic Models (DDPMs) and Large Language Models (LLMs) to improve multi-task transportation prediction. The DDPM's robust denoising capabilities enable it to recover underlying data patterns from noisy inputs, making it particularly effective in complex transportation systems. Meanwhile, the non-pretrained LLM dynamically adapts to spatial-temporal relationships within multi-modal networks, allowing the system to efficiently manage diverse transportation tasks in both long-term and short-term predictions. Extensive experiments demonstrate that STLLM-DF consistently outperforms existing models, achieving an average reduction of 2.40\% in MAE, 4.50\% in RMSE, and 1.51\% in MAPE. This model significantly advances centralized ITS by enhancing predictive accuracy, robustness, and overall system performance across multiple tasks, thus paving the way for more effective spatio-temporal traffic forecasting through the integration of frozen transformer language models and diffusion techniques.
LGJan 9
Toward an Integrated Cross-Urban Accident Prevention System: A Multi-Task Spatial-Temporal Learning Framework for Urban Safety ManagementJiayu Fang, Zhiqi Shao, Haoning Xi et al.
The development of a cross-city accident prevention system is particularly challenging due to the heterogeneity, inconsistent reporting, and inherently clustered, sparse, cyclical, and noisy nature of urban accident data. These intrinsic data properties, combined with fragmented governance and incompatible reporting standards, have long hindered the creation of an integrated, cross-city accident prevention framework. To address this gap, we propose the Mamba Local-ttention Spatial-Temporal Network MLA-STNet, a unified system that formulates accident risk prediction as a multi-task learning problem across multiple cities. MLA-STNet integrates two complementary modules: (i)the Spatio-Temporal Geographical Mamba-Attention (STG-MA), which suppresses unstable spatio-temporal fluctuations and strengthens long-range temporal dependencies; and (ii) the Spatio-Temporal Semantic Mamba-Attention (STS-MA), which mitigates cross-city heterogeneity through a shared-parameter design that jointly trains all cities while preserving individual semantic representation spaces. We validate the proposed framework through 75 experiments under two forecasting scenarios, full-day and high-frequency accident periods, using real-world datasets from New York City and Chicago. Compared with the state-of-the-art baselines, MLA-STNet achieves up to 6% lower RMSE, 8% higher Recall, and 5% higher MAP, while maintaining less than 1% performance variation under 50% input noise. These results demonstrate that MLA-STNet effectively unifies heterogeneous urban datasets within a scalable, robust, and interpretable Cross-City Accident Prevention System, paving the way for coordinated and data-driven urban safety management.
LGMar 26, 2024
CCDSReFormer: Traffic Flow Prediction with a Criss-Crossed Dual-Stream Enhanced Rectified Transformer ModelZhiqi Shao, Michael G. H. Bell, Ze Wang et al.
Accurate, and effective traffic forecasting is vital for smart traffic systems, crucial in urban traffic planning and management. Current Spatio-Temporal Transformer models, despite their prediction capabilities, struggle with balancing computational efficiency and accuracy, favoring global over local information, and handling spatial and temporal data separately, limiting insight into complex interactions. We introduce the Criss-Crossed Dual-Stream Enhanced Rectified Transformer model (CCDSReFormer), which includes three innovative modules: Enhanced Rectified Spatial Self-attention (ReSSA), Enhanced Rectified Delay Aware Self-attention (ReDASA), and Enhanced Rectified Temporal Self-attention (ReTSA). These modules aim to lower computational needs via sparse attention, focus on local information for better traffic dynamics understanding, and merge spatial and temporal insights through a unique learning method. Extensive tests on six real-world datasets highlight CCDSReFormer's superior performance. An ablation study also confirms the significant impact of each component on the model's predictive accuracy, showcasing our model's ability to forecast traffic flow effectively.
LGApr 24, 2024
ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow PredictionZhiqi Shao, Xusheng Yao, Ze Wang et al.
Accurate traffic flow prediction is crucial for optimizing traffic management, enhancing road safety, and reducing environmental impacts. Existing models face challenges with long sequence data, requiring substantial memory and computational resources, and often suffer from slow inference times due to the lack of a unified summary state. This paper introduces ST-MambaSync, an innovative traffic flow prediction model that combines transformer technology with the ST-Mamba block, representing a significant advancement in the field. We are the pioneers in employing the Mamba mechanism which is an attention mechanism integrated with ResNet within a transformer framework, which significantly enhances the model's explainability and performance. ST-MambaSync effectively addresses key challenges such as data length and computational efficiency, setting new benchmarks for accuracy and processing speed through comprehensive comparative analysis. This development has significant implications for urban planning and real-time traffic management, establishing a new standard in traffic flow prediction technology.
LGApr 20, 2024
ST-Mamba: Spatial-Temporal Selective State Space Model for Traffic Flow PredictionZhiqi Shao, Michael G. H. Bell, Ze Wang et al.
Traffic flow prediction, a critical aspect of intelligent transportation systems, has been increasingly popular in the field of artificial intelligence, driven by the availability of extensive traffic data. The current challenges of traffic flow prediction lie in integrating diverse factors while balancing the trade-off between computational complexity and the precision necessary for effective long-range and large-scale predictions. To address these challenges, we introduce a Spatial-Temporal Selective State Space (ST-Mamba) model, which is the first to leverage the power of spatial-temporal learning in traffic flow prediction without using graph modeling. The ST-Mamba model can effectively capture the long-range dependency for traffic flow data, thereby avoiding the issue of over-smoothing. The proposed ST-Mamba model incorporates an effective Spatial-Temporal Mixer (ST-Mixer) to seamlessly integrate spatial and temporal data processing into a unified framework and employs a Spatial-Temporal Selective State Space (ST-SSM) block to improve computational efficiency. The proposed ST-Mamba model, specifically designed for spatial-temporal data, simplifies processing procedure and enhances generalization capabilities, thereby significantly improving the accuracy of long-range traffic flow prediction. Compared to the previous state-of-the-art (SOTA) model, the proposed ST-Mamba model achieves a 61.11\% improvement in computational speed and increases prediction accuracy by 0.67\%. Extensive experiments with real-world traffic datasets demonstrate that the \textsf{ST-Mamba} model sets a new benchmark in traffic flow prediction, achieving SOTA performance in computational efficiency for both long- and short-range predictions and significantly improving the overall efficiency and effectiveness of traffic management.
AIAug 19, 2025
STPFormer: A State-of-the-Art Pattern-Aware Spatio-Temporal Transformer for Traffic ForecastingJiayu Fang, Zhiqi Shao, S T Boris Choy et al.
Spatio-temporal traffic forecasting is challenging due to complex temporal patterns, dynamic spatial structures, and diverse input formats. Although Transformer-based models offer strong global modeling, they often struggle with rigid temporal encoding and weak space-time fusion. We propose STPFormer, a Spatio-Temporal Pattern-Aware Transformer that achieves state-of-the-art performance via unified and interpretable representation learning. It integrates four modules: Temporal Position Aggregator (TPA) for pattern-aware temporal encoding, Spatial Sequence Aggregator (SSA) for sequential spatial learning, Spatial-Temporal Graph Matching (STGM) for cross-domain alignment, and an Attention Mixer for multi-scale fusion. Experiments on five real-world datasets show that STPFormer consistently sets new SOTA results, with ablation and visualizations confirming its effectiveness and generalizability.
LGAug 19, 2025
SVDformer: Direction-Aware Spectral Graph Embedding Learning via SVD and TransformerJiayu Fang, Zhiqi Shao, S T Boris Choy et al.
Directed graphs are widely used to model asymmetric relationships in real-world systems. However, existing directed graph neural networks often struggle to jointly capture directional semantics and global structural patterns due to their isotropic aggregation mechanisms and localized filtering mechanisms. To address this limitation, this paper proposes SVDformer, a novel framework that synergizes SVD and Transformer architecture for direction-aware graph representation learning. SVDformer first refines singular value embeddings through multi-head self-attention, adaptively enhancing critical spectral components while suppressing high-frequency noise. This enables learnable low-pass/high-pass graph filtering without requiring spectral kernels. Furthermore, by treating singular vectors as directional projection bases and singular values as scaling factors, SVDformer uses the Transformer to model multi-scale interactions between incoming/outgoing edge patterns through attention weights, thereby explicitly preserving edge directionality during feature propagation. Extensive experiments on six directed graph benchmarks demonstrate that SVDformer consistently outperforms state-of-the-art GNNs and direction-aware baselines on node classification tasks, establishing a new paradigm for learning representations on directed graphs.
LGMay 25, 2023
Revisiting Generalized p-Laplacian Regularized Framelet GCNs: Convergence, Energy Dynamic and Training with Non-Linear DiffusionDai Shi, Zhiqi Shao, Yi Guo et al.
This paper presents a comprehensive theoretical analysis of the graph p-Laplacian regularized framelet network (pL-UFG) to establish a solid understanding of its properties. We conduct a convergence analysis on pL-UFG, addressing the gap in the understanding of its asymptotic behaviors. Further by investigating the generalized Dirichlet energy of pL-UFG, we demonstrate that the Dirichlet energy remains non-zero throughout convergence, ensuring the avoidance of over-smoothing issues. Additionally, we elucidate the energy dynamic perspective, highlighting the synergistic relationship between the implicit layer in pL-UFG and graph framelets. This synergy enhances the model's adaptability to both homophilic and heterophilic data. Notably, we reveal that pL-UFG can be interpreted as a generalized non-linear diffusion process, thereby bridging the gap between pL-UFG and differential equations on the graph. Importantly, these multifaceted analyses lead to unified conclusions that offer novel insights for understanding and implementing pL-UFG, as well as other graph neural network (GNN) models. Finally, based on our dynamic analysis, we propose two novel pL-UFG models with manually controlled energy dynamics. We demonstrate empirically and theoretically that our proposed models not only inherit the advantages of pL-UFG but also significantly reduce computational costs for training on large-scale graph datasets.