Danial Saber

LG
h-index14
3papers
5citations
Novelty48%
AI Score41

3 Papers

20.7LGMay 30
RADE: Random Add-Drop Edge as a Regularizer

Danial Saber, Amirali Salehi-Abari

Graph Neural Networks (GNNs) suffer from overfitting and over-squashing of long-range information. Stochastic graph augmentations (e.g., edge deletion) regularize training against overfitting but can introduce train-inference misalignment and do not improve over-squashing. In contrast, rewiring methods improve connectivity to mitigate over-squashing, but are not designed to regularize training. We propose Random Add-Drop Edge (RADE), a stochastic graph augmentation method that jointly drops and adds edges to address both overfitting and over-squashing simultaneously. RADE is provably designed to align training and inference so that random augmentations regularize training without distribution shift, while supporting long-range communication at inference. We further propose and study a mini-batch gradient-norm balancing algorithm that adapts deletion and addition rates during training, rendering RADE hyperparameter-free in practice. Experiments on node- and graph-classification benchmarks show that RADE is a strong regularizer and mitigates over-squashing. Ablations support the roles of train-inference alignment, adaptive rate selection, and the complementary effects of random edge deletion and edge addition.

LGAug 12, 2025
Over-Squashing in GNNs and Causal Inference of Rewiring Strategies

Danial Saber, Amirali Salehi-Abari

Graph neural networks (GNNs) have exhibited state-of-the-art performance across wide-range of domains such as recommender systems, material design, and drug repurposing. Yet message-passing GNNs suffer from over-squashing -- exponential compression of long-range information from distant nodes -- which limits expressivity. Rewiring techniques can ease this bottleneck; but their practical impacts are unclear due to the lack of a direct empirical over-squashing metric. We propose a rigorous, topology-focused method for assessing over-squashing between node pairs using the decay rate of their mutual sensitivity. We then extend these pairwise assessments to four graph-level statistics (prevalence, intensity, variability, extremity). Coupling these metrics with a within-graph causal design, we quantify how rewiring strategies affect over-squashing on diverse graph- and node-classification benchmarks. Our extensive empirical analyses show that most graph classification datasets suffer from over-squashing (but to various extents), and rewiring effectively mitigates it -- though the degree of mitigation, and its translation into performance gains, varies by dataset and method. We also found that over-squashing is less notable in node classification datasets, where rewiring often increases over-squashing, and performance variations are uncorrelated with over-squashing changes. These findings suggest that rewiring is most beneficial when over-squashing is both substantial and corrected with restraint -- while overly aggressive rewiring, or rewiring applied to minimally over-squashed graphs, is unlikely to help and may even harm performance. Our plug-and-play diagnostic tool lets practitioners decide -- before any training -- whether rewiring is likely to pay off.

LGJun 17, 2024
Scalable Expressiveness through Preprocessed Graph Perturbations

Danial Saber, Amirali Salehi-Abari

Graph Neural Networks (GNNs) have emerged as the predominant method for analyzing graph-structured data. However, canonical GNNs have limited expressive power and generalization capability, thus triggering the development of more expressive yet computationally intensive methods. One such approach is to create a series of perturbed versions of input graphs and then repeatedly conduct multiple message-passing operations on all variations during training. Despite their expressive power, this approach does not scale well on larger graphs. To address this scalability issue, we introduce Scalable Expressiveness through Preprocessed Graph Perturbation (SE2P). This model offers a flexible, configurable balance between scalability and generalizability with four distinct configuration classes. At one extreme, the configuration prioritizes scalability through minimal learnable feature extraction and extensive preprocessing; at the other extreme, it enhances generalizability with more learnable feature extractions, though this increases scalability costs. We conduct extensive experiments on real-world datasets to evaluate the generalizability and scalability of SE2P variants compared to various state-of-the-art benchmarks. Our results indicate that, depending on the chosen SE2P configuration, the model can enhance generalizability compared to benchmarks while achieving significant speed improvements of up to 8-fold.