DS LGFeb 23, 2021

The Power of $D$-hops in Matching Power-Law Graphs

arXiv:2102.12975v15.915 citations

Originality Highly original

AI Analysis

This addresses the challenge of efficiently matching large-scale power-law graphs with minimal seed data, which is incremental but offers exponential improvement in seed requirements.

This paper tackles the problem of seeded graph matching for power-law graphs by developing an algorithm that exploits low-degree seeds in D-hop neighborhoods, reducing the required initial seeds from n^(1/2+ε) to Ω((log n)^(4-β)) and correctly matching a constant fraction of vertex pairs with high probability.

This paper studies seeded graph matching for power-law graphs. Assume that two edge-correlated graphs are independently edge-sampled from a common parent graph with a power-law degree distribution. A set of correctly matched vertex-pairs is chosen at random and revealed as initial seeds. Our goal is to use the seeds to recover the remaining latent vertex correspondence between the two graphs. Departing from the existing approaches that focus on the use of high-degree seeds in $1$-hop neighborhoods, we develop an efficient algorithm that exploits the low-degree seeds in suitably-defined $D$-hop neighborhoods. Specifically, we first match a set of vertex-pairs with appropriate degrees (which we refer to as the first slice) based on the number of low-degree seeds in their $D$-hop neighborhoods. This significantly reduces the number of initial seeds needed to trigger a cascading process to match the rest of the graphs. Under the Chung-Lu random graph model with $n$ vertices, max degree $Θ(\sqrt{n})$, and the power-law exponent $2<β<3$, we show that as soon as $D> \frac{4-β}{3-β}$, by optimally choosing the first slice, with high probability our algorithm can correctly match a constant fraction of the true pairs without any error, provided with only $Ω((\log n)^{4-β})$ initial seeds. Our result achieves an exponential reduction in the seed size requirement, as the best previously known result requires $n^{1/2+ε}$ seeds (for any small constant $ε>0$). Performance evaluation with synthetic and real data further corroborates the improved performance of our algorithm.

View on arXiv PDF

Similar