LG DSJun 4, 2024

GEFL: Extended Filtration Learning for Graph Classification

Simon Zhang, Soham Mukherjee, Tamal K. Dey

arXiv:2406.02732v114.213 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses graph classification by enhancing topological information capture, offering a novel approach for domains like bioinformatics or social network analysis, though it is incremental in combining known techniques.

The authors tackled graph classification by integrating extended persistence, a topological data analysis technique, into a supervised learning framework, achieving a 60x speedup in computation and demonstrating superior expressivity over existing methods on real-world datasets.

Extended persistence is a technique from topological data analysis to obtain global multiscale topological information from a graph. This includes information about connected components and cycles that are captured by the so-called persistence barcodes. We introduce extended persistence into a supervised learning framework for graph classification. Global topological information, in the form of a barcode with four different types of bars and their explicit cycle representatives, is combined into the model by the readout function which is computed by extended persistence. The entire model is end-to-end differentiable. We use a link-cut tree data structure and parallelism to lower the complexity of computing extended persistence, obtaining a speedup of more than 60x over the state-of-the-art for extended persistence computation. This makes extended persistence feasible for machine learning. We show that, under certain conditions, extended persistence surpasses both the WL[1] graph isomorphism test and 0-dimensional barcodes in terms of expressivity because it adds more global (topological) information. In particular, arbitrarily long cycles can be represented, which is difficult for finite receptive field message passing graph neural networks. Furthermore, we show the effectiveness of our method on real world datasets compared to many existing recent graph representation learning methods.

View on arXiv PDF Code

Similar