Lower Bound On the Computational Complexity of Discounted Markov Decision Problems
This provides foundational lower bounds for MDP algorithms, impacting researchers and practitioners in reinforcement learning and optimization by clarifying computational limits.
The paper tackles the computational complexity of infinite-horizon discounted-reward Markov Decision Problems (MDPs) with finite state and action spaces, showing that any randomized algorithm requires at least Ω(|S|²|A|) time to compute an ε-optimal policy with high probability, and reveals that complexity depends on input data structures, reducing to Ω(|S||A|/ε) for specific variants.
We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $|\mathcal{S}|$ and a finite action space $|\mathcal{A}|$. We show that any randomized algorithm needs a running time at least $Ω(|\mathcal{S}|^2|\mathcal{A}|)$ to compute an $ε$-optimal policy with high probability. We consider two variants of the MDP where the input is given in specific data structures, including arrays of cumulative probabilities and binary trees of transition probabilities. For these cases, we show that the complexity lower bound reduces to $Ω\left( \frac{|\mathcal{S}| |\mathcal{A}|}ε \right)$. These results reveal a surprising observation that the computational complexity of the MDP depends on the data structure of input.