LGOCMLSep 10, 2019

Online Planning with Lookahead Policies

arXiv:1909.04236v211 citations
Originality Highly original
AI Analysis

This work addresses the challenge of efficient online planning in large state spaces for researchers in reinforcement learning and AI, offering a novel theoretical improvement over existing methods.

The paper tackles the problem of improving sample complexity in online planning by introducing $h$-RTDP, a multi-step greedy algorithm that replaces the 1-step greedy policy in Real Time Dynamic Programming with an $h$-step lookahead policy, proving that increasing the lookahead horizon reduces sample complexity at the cost of more computations.

Real Time Dynamic Programming (RTDP) is an online algorithm based on Dynamic Programming (DP) that acts by 1-step greedy planning. Unlike DP, RTDP does not require access to the entire state space, i.e., it explicitly handles the exploration. This fact makes RTDP particularly appealing when the state space is large and it is not possible to update all states simultaneously. In this we devise a multi-step greedy RTDP algorithm, which we call $h$-RTDP, that replaces the 1-step greedy policy with a $h$-step lookahead policy. We analyze $h$-RTDP in its exact form and establish that increasing the lookahead horizon, $h$, results in an improved sample complexity, with the cost of additional computations. This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning. We then analyze the performance of $h$-RTDP in three approximate settings: approximate model, approximate value updates, and approximate state representation. For these cases, we prove that the asymptotic performance of $h$-RTDP remains the same as that of a corresponding approximate DP algorithm, the best one can hope for without further assumptions on the approximation errors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes