Towards a Taxonomy of Graph Learning Datasets
This work addresses a foundational problem for the graph learning community by providing a taxonomy to improve benchmarking and model development, though it is incremental as it builds on existing dataset analysis.
The authors tackled the lack of systematic understanding in graph neural network (GNN) benchmarking by developing a principled approach to taxonomize graph datasets using designed perturbations, resulting in a new understanding of critical dataset characteristics for better model evaluation and specialized GNN development.
Graph neural networks (GNNs) have attracted much attention due to their ability to leverage the intrinsic geometries of the underlying data. Although many different types of GNN models have been developed, with many benchmarking procedures to demonstrate the superiority of one GNN model over the others, there is a lack of systematic understanding of the underlying benchmarking datasets, and what aspects of the model are being tested. Here, we provide a principled approach to taxonomize graph benchmarking datasets by carefully designing a collection of graph perturbations to probe the essential data characteristics that GNN models leverage to perform predictions. Our data-driven taxonomization of graph datasets provides a new understanding of critical dataset characteristics that will enable better model evaluation and the development of more specialized GNN models.