LGApr 9, 2021

Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning

arXiv:2104.04174v13 citations
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in model-based RL for improving sample efficiency, but it is incremental as it builds on existing reweighting and meta-gradient techniques.

The paper tackles the problem of inaccurate or biased dynamics models in model-based reinforcement learning, which can harm training with imaginary trajectories, by proposing an adaptive reweighting method for imaginary transitions based on their impact on real-sample loss, resulting in outperforming state-of-the-art algorithms on multiple tasks.

Model-based reinforcement learning (RL) is more sample efficient than model-free RL by using imaginary trajectories generated by the learned dynamics model. When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions. To alleviate such problem, this paper proposes to adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories. More specifically, we evaluate the effect of an imaginary transition by calculating the change of the loss computed on the real samples when we use the transition to train the action-value and policy functions. Based on this evaluation criterion, we construct the idea of reweighting each imaginary transition by a well-designed meta-gradient algorithm. Extensive experimental results demonstrate that our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks. Visualization of our changing weights further validates the necessity of utilizing reweight scheme.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes