LGMLFeb 27, 2023

Optimistic Planning by Regularized Dynamic Programming

arXiv:2302.14004v38 citationsh-index: 28
Originality Incremental advance
AI Analysis

This work addresses planning and learning in complex MDPs, offering a computationally efficient algorithm with theoretical guarantees, though it appears incremental as it builds on existing approximate dynamic programming methods.

The authors tackled the problem of optimistic planning in infinite-horizon discounted Markov decision processes by proposing a new method based on regularized dynamic programming, which avoids traditional contraction and monotonicity arguments and achieves near-optimal statistical guarantees for learning policies in discounted linear mixture MDPs from a single stream of experience.

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes