Identifying bias in cluster quality metrics
This work addresses bias in cluster evaluation metrics for researchers and practitioners in network analysis and community detection, representing an incremental improvement.
The researchers tackled the problem of bias in popular cluster quality metrics like conductance and modularity by generating networks with preset community structures using stochastic and preferential attachment block models. They found that most metrics favor partitions with fewer, larger clusters, with modularity and their proposed density ratio metric showing less bias.
We study potential biases of popular cluster quality metrics, such as conductance or modularity. We propose a method that uses both stochastic and preferential attachment block models construction to generate networks with preset community structures, to which quality metrics will be applied. These models also allow us to generate multi-level structures of varying strength, which will show if metrics favour partitions into a larger or smaller number of clusters. Additionally, we propose another quality metric, the density ratio. We observed that most of the studied metrics tend to favour partitions into a smaller number of big clusters, even when their relative internal and external connectivity are the same. The metrics found to be less biased are modularity and density ratio.