Linking PageRank, Time Reversal, and Policy Evaluation

arXiv:2605.0053270.0
AI Analysis

This work offers a new theoretical bridge between reinforcement learning and network analysis, potentially enabling the use of efficient PageRank algorithms for policy evaluation in MDPs.

The paper establishes a theoretical connection between policy evaluation in Markov decision processes and PageRank, showing that the value function can be obtained from PageRank vectors via time reversal. It provides a decomposition theorem for arbitrary finite MDPs and demonstrates efficiency on a numerical example with large graphs.

We establish a connection between policy evaluation in Markov decision processes and PageRank in network analysis. For a fixed policy, we show that the value function of a discounted Markov decision process can be obtained, up to an explicit rescaling, from the PageRank vector of a suitably defined time-reversed Markov chain. In this correspondence, the discount factor plays the role of the teleportation parameter, while rewards induce the restart distribution. Beyond the irreducible case, invoking quasi-stationary distributions and Doob $h$-transforms, we prove a general decomposition theorem showing that policy evaluation for arbitrary finite MDPs reduces to a collection of PageRank problems on the recurrent and transient components of the policy-induced Markov chain. This framework naturally extends to undiscounted MDPs with terminal states and to transition-dependent rewards. We conclude by showing efficiency of our approach on a numerical example of a sticky random walk on large deterministic and random graphs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes