LGAIJan 28, 2022

Planning and Learning with Adaptive Lookahead

arXiv:2201.12403v213 citations
AI Analysis

This work addresses a bottleneck in planning-based RL for improved efficiency, though it appears incremental.

The paper tackles the problem of fixed planning horizons in reinforcement learning by proposing an adaptive lookahead selection strategy based on state-dependent value estimates, resulting in demonstrated efficacy in maze and Atari environments.

Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes