DSMay 18

Estimating Random-Walk Probabilities in Directed Graphs

Christian Bertram, Mads Vestergaard Jensen, Mikkel Thorup, Hanzhi Wang, Shuyi Yan

arXiv:2504.164816.94 citationsh-index: 4

AI Analysis

For researchers and practitioners working on graph analysis and web search, this work provides definitive complexity bounds for a fundamental estimation problem, resolving long-standing open questions.

This paper closes all existing gaps between upper and lower bounds for estimating Personalized PageRank (discounted random walk probabilities) in directed graphs across all problem variants and query combinations, achieving tight bounds (up to log factors) in both worst-case and average-case settings. For example, they propose algorithms that deterministically estimate arbitrarily small probabilities in O(m log n) time, and improve the single-pair lower bound from Ω(min{n,1/δ}) to Ω(min{m,1/δ}).

We study discounted random walks in directed graphs. In each step, the walk either terminates with a constant probability $α$, or proceeds to a random out-neighbor. Our goal is to estimate the probability $π(s, t)$ that a discounted random walk starting from $s$ terminates at $t$. This probability is also known as the Personalized PageRank (PPR) score, which measures the relevance of $t$ to $s$, for instance, when $s$ and $t$ are web pages on the Internet. We aim to estimate $π(s, t)$ within a constant relative error with constant probability. A variety of algorithms have been developed for several problem variants, such as single-pair, single-source, single-target, and single-node estimation, under both worst-case and average-case settings, and for different combinations of allowed graph queries. However, in many important cases, there remain polynomial gaps between known upper and lower bounds. In this paper, we establish tight upper and lower bounds (up to logarithmic factors of $n$) for all problem variants and query combinations, closing all existing gaps in both the worst-case and average-case settings. Below we give some examples for the worst-case settings. As an upper-bound example, the classic power method estimates $π(s,t)$ if it is above a threshold $δ$ in time $O(m\log(1/δ))$ but $π(s,t)$ can be as small as $1/n^{Θ(n)}$. For contrast, we propose algorithms that deterministically estimate arbitrarily small $π(s,t)$ in $O(m\log n)$ time. As a lower-bound example, we improve the lower bound for the single-pair problem from $Ω(\min\{n,1/δ\})$ to $Ω(\min\{m,1/δ\})$, which is optimal (up to logarithmic factors) since a simple Monte Carlo estimate takes $O(1/δ)$ time.

View on arXiv PDF

Similar