LGDec 26, 2022

2-hop Neighbor Class Similarity (2NCS): A graph structural metric indicative of graph neural network performance

Andrea Cavallo, Claas Grohnfeldt, Michele Russo, Giulio Lovisotto, Luca Vassio

arXiv:2212.13202v111.820 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses the challenge of understanding GNN performance variability on heterophilous graphs, which is incremental as it builds on prior metrics to improve prediction accuracy.

The paper tackled the problem of predicting Graph Neural Network (GNN) performance on heterophilous graphs by introducing 2-hop Neighbor Class Similarity (2NCS), a new graph structural metric that correlates more strongly with GNN accuracy than existing metrics like homophily ratio and Cross-Class Neighborhood Similarity, as validated on synthetic and real-world datasets.

Graph Neural Networks (GNNs) achieve state-of-the-art performance on graph-structured data across numerous domains. Their underlying ability to represent nodes as summaries of their vicinities has proven effective for homophilous graphs in particular, in which same-type nodes tend to connect. On heterophilous graphs, in which different-type nodes are likely connected, GNNs perform less consistently, as neighborhood information might be less representative or even misleading. On the other hand, GNN performance is not inferior on all heterophilous graphs, and there is a lack of understanding of what other graph properties affect GNN performance. In this work, we highlight the limitations of the widely used homophily ratio and the recent Cross-Class Neighborhood Similarity (CCNS) metric in estimating GNN performance. To overcome these limitations, we introduce 2-hop Neighbor Class Similarity (2NCS), a new quantitative graph structural property that correlates with GNN performance more strongly and consistently than alternative metrics. 2NCS considers two-hop neighborhoods as a theoretically derived consequence of the two-step label propagation process governing GCN's training-inference process. Experiments on one synthetic and eight real-world graph datasets confirm consistent improvements over existing metrics in estimating the accuracy of GCN- and GAT-based architectures on the node classification task.

View on arXiv PDF

Similar