LG AIJun 27, 2024

Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning

Bradley Burega, John D. Martin, Luke Kapeluck, Michael Bowling

arXiv:2406.19561v12.6

Originality Incremental advance

AI Analysis

This work addresses sample efficiency challenges for reinforcement learning systems using model-based planning, but it is incremental as it builds on existing Dyna-style methods with a novel tuning approach.

The paper tackles the problem of maintaining sample efficiency in reinforcement learning with imperfect environment models, particularly under resource constraints and continual changes, by introducing an online meta-gradient algorithm that tunes state query probabilities in Dyna-style planning, resulting in improved planning efficiency and overall learning sample efficiency.

We study how a Reinforcement Learning (RL) system can remain sample-efficient when learning from an imperfect model of the environment. This is particularly challenging when the learning system is resource-constrained and in continual settings, where the environment dynamics change. To address these challenges, our paper introduces an online, meta-gradient algorithm that tunes a probability with which states are queried during Dyna-style planning. Our study compares the aggregate, empirical performance of this meta-gradient method to baselines that employ conventional sampling strategies. Results indicate that our method improves efficiency of the planning process, which, as a consequence, improves the sample-efficiency of the overall learning process. On the whole, we observe that our meta-learned solutions avoid several pathologies of conventional planning approaches, such as sampling inaccurate transitions and those that stall credit assignment. We believe these findings could prove useful, in future work, for designing model-based RL systems at scale.

View on arXiv PDF

Similar