LGDec 19, 2020

A pipeline for fair comparison of graph neural networks in node classification tasks

Wentao Zhao, Dalin Zhou, Xinguo Qiu, Wei Jiang

arXiv:2012.10619v13.32 citationsHas Code

Originality Incremental advance

AI Analysis

This work provides a standardized benchmark and experimental pipeline for researchers and practitioners to fairly compare GNN architectures in node classification, addressing an incremental problem of experimental rigor.

This paper addresses the lack of standardized training settings for comparing Graph Neural Networks (GNNs) in node classification. They developed a benchmark with 9 datasets and 7 models, finding that topological information is crucial, increasing layers often doesn't improve performance, and node2vec-based data augmentation substantially boosts baseline performance.

Graph neural networks (GNNs) have been investigated for potential applicability in multiple fields that employ graph data. However, there are no standard training settings to ensure fair comparisons among new methods, including different model architectures and data augmentation techniques. We introduce a standard, reproducible benchmark to which the same training settings can be applied for node classification. For this benchmark, we constructed 9 datasets, including both small- and medium-scale datasets from different fields, and 7 different models. We design a k-fold model assessment strategy for small datasets and a standard set of model training procedures for all datasets, enabling a standard experimental pipeline for GNNs to help ensure fair model architecture comparisons. We use node2vec and Laplacian eigenvectors to perform data augmentation to investigate how input features affect the performance of the models. We find topological information is important for node classification tasks. Increasing the number of model layers does not improve the performance except on the PATTERN and CLUSTER datasets, in which the graphs are not connected. Data augmentation is highly useful, especially using node2vec in the baseline, resulting in a substantial baseline performance improvement.

View on arXiv PDF Code

Similar