Subgroup Generalization and Fairness of Graph Neural Networks
This addresses fairness and generalization issues in GNNs for researchers and practitioners, but it is incremental as it builds on existing PAC-Bayesian analysis.
The paper tackles the problem of understanding generalization and fairness in graph neural networks (GNNs) for non-IID node-level tasks, showing that the distance between test subgroups and the training set affects performance and fairness, with experimental support across models and datasets.
Despite enormous successful applications of graph neural networks (GNNs), theoretical understanding of their generalization ability, especially for node-level tasks where data are not independent and identically-distributed (IID), has been sparse. The theoretical investigation of the generalization performance is beneficial for understanding fundamental issues (such as fairness) of GNN models and designing better learning methods. In this paper, we present a novel PAC-Bayesian analysis for GNNs under a non-IID semi-supervised learning setup. Moreover, we analyze the generalization performances on different subgroups of unlabeled nodes, which allows us to further study an accuracy-(dis)parity-style (un)fairness of GNNs from a theoretical perspective. Under reasonable assumptions, we demonstrate that the distance between a test subgroup and the training set can be a key factor affecting the GNN performance on that subgroup, which calls special attention to the training node selection for fair learning. Experiments across multiple GNN models and datasets support our theoretical results.