LGAIJun 16, 2022

A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings

arXiv:2206.08442v38 citationsh-index: 65
Originality Synthesis-oriented
AI Analysis

This work provides insights for RL practitioners on method selection, but it is incremental as it builds on existing planning paradigms.

The study compared value-based decision-time and background planning methods in model-based reinforcement learning, finding that modern instantiations of decision-time planning perform on par or better than background planning in regular RL and transfer learning settings.

In model-based reinforcement learning (RL), an agent can leverage a learned model to improve its way of behaving in different ways. Two of the prevalent ways to do this are through decision-time and background planning methods. In this study, we are interested in understanding how the value-based versions of these two planning methods will compare against each other across different settings. Towards this goal, we first consider the simplest instantiations of value-based decision-time and background planning methods and provide theoretical results on which one will perform better in the regular RL and transfer learning settings. Then, we consider the modern instantiations of them and provide hypotheses on which one will perform better in the same settings. Finally, we perform illustrative experiments to validate these theoretical results and hypotheses. Overall, our findings suggest that even though value-based versions of the two planning methods perform on par in their simplest instantiations, the modern instantiations of value-based decision-time planning methods can perform on par or better than the modern instantiations of value-based background planning methods in both the regular RL and transfer learning settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes