What Makes Graph Neural Networks Miscalibrated?
This work addresses the need for reliable uncertainty estimation in GNNs, which is crucial for applications like social network analysis or drug discovery, though it is incremental as it builds on existing calibration techniques.
The paper tackled the problem of miscalibration in graph neural networks (GNNs) by identifying five key factors influencing calibration and developing Graph Attention Temperature Scaling (GATS), a novel method that achieved state-of-the-art calibration results across various datasets and GNN backbones.
Given the importance of getting calibrated predictions and reliable uncertainty estimations, various post-hoc calibration methods have been developed for neural networks on standard multi-class classification tasks. However, these methods are not well suited for calibrating graph neural networks (GNNs), which presents unique challenges such as accounting for the graph structure and the graph-induced correlations between the nodes. In this work, we conduct a systematic study on the calibration qualities of GNN node predictions. In particular, we identify five factors which influence the calibration of GNNs: general under-confident tendency, diversity of nodewise predictive distributions, distance to training nodes, relative confidence level, and neighborhood similarity. Furthermore, based on the insights from this study, we design a novel calibration method named Graph Attention Temperature Scaling (GATS), which is tailored for calibrating graph neural networks. GATS incorporates designs that address all the identified influential factors and produces nodewise temperature scaling using an attention-based architecture. GATS is accuracy-preserving, data-efficient, and expressive at the same time. Our experiments empirically verify the effectiveness of GATS, demonstrating that it can consistently achieve state-of-the-art calibration results on various graph datasets for different GNN backbones.