LGMLJul 13, 2020

Efficient Planning in Large MDPs with Weak Linear Function Approximation

arXiv:2007.06184v122 citations
AI Analysis

This addresses the challenge of efficient planning in large-scale MDPs for applications like robotics or AI, though it is incremental as it builds on existing linear approximation methods with relaxed assumptions.

The paper tackles the problem of planning in large Markov decision processes (MDPs) by using weak linear function approximation, requiring only low approximation error for the optimal value function and a small set of core states, and it achieves almost-optimal actions with computation time scaling polynomially with features, core states, actions, and effective horizon.

Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes