CEMR: An Effective Subgraph Matching Algorithm with Redundant Extension Elimination
This work provides a more efficient subgraph matching algorithm for researchers and practitioners working with large graph analysis problems, offering a strong specific gain in performance.
The paper addresses the challenge of efficiently enumerating subgraph matches in large graphs, a known NP-hard problem. It introduces CEMR, a novel algorithm that significantly reduces duplicate computations during the search process, leading to superior performance compared to existing state-of-the-art methods on real-world datasets.
Subgraph matching is a fundamental problem in graph analysis with a wide range of applications. However, due to its inherent NP-hardness, enumerating subgraph matches efficiently on large real-world graphs remains highly challenging. Most existing works adopt a depth-first search (DFS) backtracking strategy, where a partial embedding is gradually extended in a DFS manner along a branch of the search trees until either a full embedding is found or no further extension is possible. A major limitation of this paradigm is the significant amount of duplicate computation that occurs during enumeration, which increases the overall runtime. To overcome this limitation, we propose a novel subgraph matching algorithm, CEMR. It incorporates two techniques to reduce duplicate extensions: common extension merging, which leverages a black-white vertex encoding, and common extension reusing, which employs common extension buffers. In addition, we design two pruning techniques to discard unpromising search branches. Extensive experiments on real-world datasets and diverse query workloads demonstrate that CEMR outperforms state-of-the-art subgraph matching methods.