Locality, Not Spectral Mixing, Governs Direct Propagation in Distributed Offline Dynamic Programming
For distributed reinforcement learning and dynamic programming, this work clarifies the fundamental communication barrier, showing that gossip-based methods are suboptimal and that direct propagation is provably optimal.
The paper proves that locality, not spectral mixing, governs the round complexity of distributed offline dynamic programming, showing direct propagation achieves optimal scaling while gossip methods incur an extra dependence on the spectral gap.
We study the communication complexity of distributed offline dynamic programming, where a fixed batch dataset is partitioned across (M) machines connected by the data-induced dependency graph. We compare two paradigms: direct boundary-value propagation, which follows Bellman dependencies, and gossip averaging, which mixes local estimates. Our results show that **locality** is the fundamental driver of round complexity. In particular, we prove that no method can achieve (\varepsilon)-accuracy in fewer than (L_\varepsilon = \left\lfloor \log(1/2\varepsilon) / \log(1/γ) \right\rfloor) rounds on graphs of diameter at least (L_\varepsilon), and we show that direct propagation matches this scaling up to constants, attaining error (O(γ^T/(1-γ) + δ/(1-γ))) after (T) rounds. In contrast, gossip-style fitted value iteration incurs an additional (1/\mathrm{gap}(W)) dependence in both convergence rate and asymptotic error. We also prove bandwidth-sensitive lower bounds on path topologies and extend the analysis to asynchronous systems with bounded delays. Together, these results show that spectral dependence is an artifact of gossip-based algorithms, whereas locality is the intrinsic barrier in distributed offline dynamic programming.