Classification on Large Networks: A Quantitative Bound via Motifs and Graphons
This work addresses the challenge of classifying large graph data with theoretical guarantees, which is significant for domains like medical diagnostics, though it appears incremental as it builds on existing graphon theory.
The authors tackled the problem of classifying large networks by providing explicit quantitative bounds for motif homomorphisms and graph spectra to distinguish networks under noise, resulting in a theoretically guaranteed classifier that yields competitive results on Lupus Erythematosus disease classification tasks.
When each data point is a large graph, graph statistics such as densities of certain subgraphs (motifs) can be used as feature vectors for machine learning. While intuitive, motif counts are expensive to compute and difficult to work with theoretically. Via graphon theory, we give an explicit quantitative bound for the ability of motif homomorphisms to distinguish large networks under both generative and sampling noise. Furthermore, we give similar bounds for the graph spectrum and connect it to homomorphism densities of cycles. This results in an easily computable classifier on graph data with theoretical performance guarantee. Our method yields competitive results on classification tasks for the autoimmune disease Lupus Erythematosus.