Are Graph Neural Networks Miscalibrated?
This work highlights a critical reliability problem for GNNs in high-stakes decision-making applications, though it is incremental as it builds on prior calibration research.
The paper investigates the calibration of Graph Neural Networks (GNNs) across multiple datasets, finding that GNNs are well-calibrated in some cases but severely miscalibrated in others, and that existing calibration methods are insufficient to fully address the issue.
Graph Neural Networks (GNNs) have proven to be successful in many classification tasks, outperforming previous state-of-the-art methods in terms of accuracy. However, accuracy alone is not enough for high-stakes decision making. Decision makers want to know the likelihood that a specific GNN prediction is correct. For this purpose, obtaining calibrated models is essential. In this work, we perform an empirical evaluation of the calibration of state-of-the-art GNNs on multiple datasets. Our experiments show that GNNs can be calibrated in some datasets but also badly miscalibrated in others, and that state-of-the-art calibration methods are helpful but do not fix the problem.