LGAISep 21, 2020

Dynamic Horizon Value Estimation for Model-based Reinforcement Learning

arXiv:2009.09593v18 citations
Originality Incremental advance
AI Analysis

This work addresses sample efficiency in model-based reinforcement learning for visual control, offering an incremental improvement over existing methods.

The paper tackled the problem of fixed rollout horizons in model-based value expansion harming learning due to model inaccuracies, and proposed DMVE to adaptively adjust horizons using reconstruction-based novelty detection, achieving superior sample efficiency and final performance on benchmark visual control tasks.

Existing model-based value expansion methods typically leverage a world model for value estimation with a fixed rollout horizon to assist policy learning. However, the fixed rollout with an inaccurate model has a potential to harm the learning process. In this paper, we investigate the idea of using the model knowledge for value expansion adaptively. We propose a novel method called Dynamic-horizon Model-based Value Expansion (DMVE) to adjust the world model usage with different rollout horizons. Inspired by reconstruction-based techniques that can be applied for visual data novelty detection, we utilize a world model with a reconstruction module for image feature extraction, in order to acquire more precise value estimation. The raw and the reconstructed images are both used to determine the appropriate horizon for adaptive value expansion. On several benchmark visual control tasks, experimental results show that DMVE outperforms all baselines in sample efficiency and final performance, indicating that DMVE can achieve more effective and accurate value estimation than state-of-the-art model-based methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes