LG SI MLNov 14, 2018

Pitfalls of Graph Neural Network Evaluation

Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, Stephan Günnemann

arXiv:1811.05868v247.21793 citations

Originality Incremental advance

AI Analysis

This work addresses evaluation pitfalls in GNN research, which is crucial for researchers and practitioners in graph mining to ensure reliable benchmarking and model selection.

The paper identifies serious shortcomings in existing evaluation strategies for graph neural networks (GNNs), showing that using fixed data splits and inconsistent training procedures leads to unfair comparisons, and reveals that simpler GNN architectures can outperform more sophisticated ones with fair tuning.

Semi-supervised node classification in graphs is a fundamental problem in graph mining, and the recently proposed graph neural networks (GNNs) have achieved unparalleled results on this task. Due to their massive success, GNNs have attracted a lot of attention, and many novel architectures have been put forward. In this paper we show that existing evaluation strategies for GNN models have serious shortcomings. We show that using the same train/validation/test splits of the same datasets, as well as making significant changes to the training procedure (e.g. early stopping criteria) precludes a fair comparison of different architectures. We perform a thorough empirical evaluation of four prominent GNN models and show that considering different splits of the data leads to dramatically different rankings of models. Even more importantly, our findings suggest that simpler GNN architectures are able to outperform the more sophisticated ones if the hyperparameters and the training procedure are tuned fairly for all models.

View on arXiv PDF

Similar