LG AIJun 2, 2023

Demystifying Structural Disparity in Graph Neural Networks: Can One Size Fit All?

Haitao Mao, Zhikai Chen, Wei Jin, Haoyu Han, Yao Ma, Tong Zhao, Neil Shah, Jiliang Tang

arXiv:2306.01323v325.753 citationsh-index: 90Has Code

Originality Incremental advance

AI Analysis

This addresses a critical limitation in GNNs for node classification in real-world graphs with mixed structural patterns, offering insights for improving generalization and out-of-distribution performance.

The study tackles the performance disparity of Graph Neural Networks (GNNs) on nodes with different structural patterns in graphs, showing they excel on homophilic nodes in homophilic graphs and heterophilic nodes in heterophilic graphs but struggle on the opposite sets, with theoretical and empirical analysis identifying aggregated feature distance and homophily ratio differences as key reasons.

Recent studies on Graph Neural Networks(GNNs) provide both empirical and theoretical evidence supporting their effectiveness in capturing structural patterns on both homophilic and certain heterophilic graphs. Notably, most real-world homophilic and heterophilic graphs are comprised of a mixture of nodes in both homophilic and heterophilic structural patterns, exhibiting a structural disparity. However, the analysis of GNN performance with respect to nodes exhibiting different structural patterns, e.g., homophilic nodes in heterophilic graphs, remains rather limited. In the present study, we provide evidence that Graph Neural Networks(GNNs) on node classification typically perform admirably on homophilic nodes within homophilic graphs and heterophilic nodes within heterophilic graphs while struggling on the opposite node set, exhibiting a performance disparity. We theoretically and empirically identify effects of GNNs on testing nodes exhibiting distinct structural patterns. We then propose a rigorous, non-i.i.d PAC-Bayesian generalization bound for GNNs, revealing reasons for the performance disparity, namely the aggregated feature distance and homophily ratio difference between training and testing nodes. Furthermore, we demonstrate the practical implications of our new findings via (1) elucidating the effectiveness of deeper GNNs; and (2) revealing an over-looked distribution shift factor on graph out-of-distribution problem and proposing a new scenario accordingly.

View on arXiv PDF Code

Similar