LGFeb 9

Rethinking Graph Generalization through the Lens of Sharpness-Aware Minimization

arXiv:2602.08855v11.4h-index: 7

Originality Incremental advance

AI Analysis

This addresses a critical issue for graph machine learning practitioners by enhancing model robustness to distribution shifts, though it is incremental as it builds on existing sharpness-aware minimization concepts.

The paper tackles the problem of graph neural networks being sensitive to distribution shifts, specifically the Minimal Shift Flip phenomenon where small deviations cause misclassifications, and proposes an energy-driven generative augmentation framework that improves graph out-of-distribution generalization, outperforming state-of-the-art baselines in experiments.

Graph Neural Networks (GNNs) have achieved remarkable success across various graph-based tasks but remain highly sensitive to distribution shifts. In this work, we focus on a prevalent yet under-explored phenomenon in graph generalization, Minimal Shift Flip (MSF),where test samples that slightly deviate from the training distribution are abruptly misclassified. To interpret this phenomenon, we revisit MSF through the lens of Sharpness-Aware Minimization (SAM), which characterizes the local stability and sharpness of the loss landscape while providing a theoretical foundation for modeling generalization error. To quantify loss sharpness, we introduce the concept of Local Robust Radius, measuring the smallest perturbation required to flip a prediction and establishing a theoretical link between local stability and generalization. Building on this perspective, we further observe a continual decrease in the robust radius during training, indicating weakened local stability and an increasingly sharp loss landscape that gives rise to MSF. To jointly solve the MSF phenomenon and the intractability of radius, we develop an energy-based formulation that is theoretically proven to be monotonically correlated with the robust radius, offering a tractable and principled objective for modeling flatness and stability. Building on these insights, we propose an energy-driven generative augmentation framework (E2A) that leverages energy-guided latent perturbations to generate pseudo-OOD samples and enhance model generalization. Extensive experiments across multiple benchmarks demonstrate that E2A consistently improves graph OOD generalization, outperforming state-of-the-art baselines.

View on arXiv PDF

Similar