Dissecting graph measure performance for node clustering in LFR parameter space
This work provides practical guidance for selecting graph measures in clustering tasks, but it is incremental as it focuses on systematic evaluation rather than introducing new methods.
The study evaluated 25 graph measures for node clustering across a wide range of generated graphs using the LFR parameter space, identifying distinct zones where specific measures perform best and enabling recommendations based on graph parameters.
Graph measures that express closeness or distance between nodes can be employed for graph nodes clustering using metric clustering algorithms. There are numerous measures applicable to this task, and which one performs better is an open question. We study the performance of 25 graph measures on generated graphs with different parameters. While usually measure comparisons are limited to general measure ranking on a particular dataset, we aim to explore the performance of various measures depending on graph features. Using an LFR graph generator, we create a dataset of 11780 graphs covering the whole LFR parameter space. For each graph, we assess the quality of clustering with k-means algorithm for each considered measure. Based on this, we determine the best measure for each area of the parameter space. We find that the parameter space consists of distinct zones where one particular measure is the best. We analyze the geometry of the resulting zones and describe it with simple criteria. Given particular graph parameters, this allows us to recommend a particular measure to use for clustering.