Uncertainty in GNN Learning Evaluations: The Importance of a Consistent Benchmark for Community Detection
This addresses the issue of unreliable evaluations for researchers and practitioners using GNNs in applications like social networks and genomics, but it is incremental as it focuses on standardization rather than new methods.
The authors tackled the problem of inconsistent benchmarks for evaluating Graph Neural Networks (GNNs) in unsupervised community detection, proposing a common evaluation protocol that reveals significant differences in reported performance and enables more reliable comparisons.
Graph Neural Networks (GNNs) have improved unsupervised community detection of clustered nodes due to their ability to encode the dual dimensionality of the connectivity and feature information spaces of graphs. Identifying the latent communities has many practical applications from social networks to genomics. Current benchmarks of real world performance are confusing due to the variety of decisions influencing the evaluation of GNNs at this task. To address this, we propose a framework to establish a common evaluation protocol. We motivate and justify it by demonstrating the differences with and without the protocol. The W Randomness Coefficient is a metric proposed for assessing the consistency of algorithm rankings to quantify the reliability of results under the presence of randomness. We find that by ensuring the same evaluation criteria is followed, there may be significant differences from the reported performance of methods at this task, but a more complete evaluation and comparison of methods is possible.