LG SIFeb 22, 2024

Representation Learning for Frequent Subgraph Mining

Rex Ying, Tianyu Fu, Andrew Wang, Jiaxuan You, Yu Wang, Jure Leskovec

Tsinghua

arXiv:2402.14367v111.514 citationsh-index: 30

Originality Incremental advance

AI Analysis

This addresses a crucial problem in network analysis for researchers and practitioners, offering a scalable solution for motif discovery, though it is incremental as it builds on existing neural and embedding techniques.

The paper tackles the problem of identifying frequent subgraphs (network motifs) in large graphs, which is challenging due to NP-hard subgraph counting and exponential pattern growth, and presents SPMiner, a neural approach that achieves near-perfect accuracy for 5- and 6-node motifs while being 100x faster than exact methods and can identify larger motifs up to 20 nodes with 10-100x higher frequency than current approximate methods.

Identifying frequent subgraphs, also called network motifs, is crucial in analyzing and predicting properties of real-world networks. However, finding large commonly-occurring motifs remains a challenging problem not only due to its NP-hard subroutine of subgraph counting, but also the exponential growth of the number of possible subgraphs patterns. Here we present Subgraph Pattern Miner (SPMiner), a novel neural approach for approximately finding frequent subgraphs in a large target graph. SPMiner combines graph neural networks, order embedding space, and an efficient search strategy to identify network subgraph patterns that appear most frequently in the target graph. SPMiner first decomposes the target graph into many overlapping subgraphs and then encodes each subgraph into an order embedding space. SPMiner then uses a monotonic walk in the order embedding space to identify frequent motifs. Compared to existing approaches and possible neural alternatives, SPMiner is more accurate, faster, and more scalable. For 5- and 6-node motifs, we show that SPMiner can almost perfectly identify the most frequent motifs while being 100x faster than exact enumeration methods. In addition, SPMiner can also reliably identify frequent 10-node motifs, which is well beyond the size limit of exact enumeration approaches. And last, we show that SPMiner can find large up to 20 node motifs with 10-100x higher frequency than those found by current approximate methods.

View on arXiv PDF

Similar