OC LGJan 6, 2019

Large-Scale Markov Decision Problems via the Linear Programming Dual

Yasin Abbasi-Yadkori, Peter L. Bartlett, Xi Chen, Alan Malek

arXiv:1901.01992v112.113 citations

Originality Incremental advance

AI Analysis

This work addresses the computational challenge of controlling large MDPs for researchers and practitioners in reinforcement learning and operations research, representing an incremental advance by introducing a new tractable optimization approach for a known bottleneck.

The paper tackles the intractable planning problem in large-scale Markov decision processes by optimizing over a small family of policies derived from low-dimensional approximations of occupancy measures, proposing an efficient algorithm that scales with subspace size rather than state space and bounds excess loss in average and discounted cost cases, with preliminary experiments demonstrating effectiveness in a queueing application.

We consider the problem of controlling a fully specified Markov decision process (MDP), also known as the planning problem, when the state space is very large and calculating the optimal policy is intractable. Instead, we pursue the more modest goal of optimizing over some small family of policies. Specifically, we show that the family of policies associated with a low-dimensional approximation of occupancy measures yields a tractable optimization. Moreover, we propose an efficient algorithm, scaling with the size of the subspace but not the state space, that is able to find a policy with low excess loss relative to the best policy in this class. To the best of our knowledge, such results did not exist in the literature previously. We bound excess loss in the average cost and discounted cost cases, which are treated separately. Preliminary experiments show the effectiveness of the proposed algorithms in a queueing application.

View on arXiv PDF

Similar