Graphon based Clustering and Testing of Networks: Algorithms and Theory
This work addresses clustering and testing challenges for network data in domains like protein structures and social networks, representing an incremental advance with theoretical backing.
The authors tackled the problem of clustering and testing networks without vertex correspondence by proposing a novel graph distance based on graphon estimators, achieving state-of-the-art results with proven statistical consistency under Lipschitz assumptions.
Network-valued data are encountered in a wide range of applications and pose challenges in learning due to their complex structure and absence of vertex correspondence. Typical examples of such problems include classification or grouping of protein structures and social networks. Various methods, ranging from graph kernels to graph neural networks, have been proposed that achieve some success in graph classification problems. However, most methods have limited theoretical justification, and their applicability beyond classification remains unexplored. In this work, we propose methods for clustering multiple graphs, without vertex correspondence, that are inspired by the recent literature on estimating graphons -- symmetric functions corresponding to infinite vertex limit of graphs. We propose a novel graph distance based on sorting-and-smoothing graphon estimators. Using the proposed graph distance, we present two clustering algorithms and show that they achieve state-of-the-art results. We prove the statistical consistency of both algorithms under Lipschitz assumptions on the graph degrees. We further study the applicability of the proposed distance for graph two-sample testing problems.