LGJun 4, 2022

Hybrid Value Estimation for Off-policy Evaluation and Offline Reinforcement Learning

arXiv:2206.02000v14 citationsh-index: 45
Originality Incremental advance
AI Analysis

This work addresses the problem of accurate value estimation in offline reinforcement learning for researchers and practitioners, offering incremental improvements through a hybrid approach.

The paper tackles the challenge of value function estimation in offline reinforcement learning by proposing Hybrid Value Estimation (HVE), which balances bias and variance using offline data and learned models, resulting in improved error bounds and empirical performance on MuJoCo tasks, with OPHVE outperforming other off-policy evaluation methods and MOHVE achieving competitive results with state-of-the-art algorithms.

Value function estimation is an indispensable subroutine in reinforcement learning, which becomes more challenging in the offline setting. In this paper, we propose Hybrid Value Estimation (HVE) to reduce value estimation error, which trades off bias and variance by balancing between the value estimation from offline data and the learned model. Theoretical analysis discloses that HVE enjoys a better error bound than the direct methods. HVE can be leveraged in both off-policy evaluation and offline reinforcement learning settings. We, therefore, provide two concrete algorithms Off-policy HVE (OPHVE) and Model-based Offline HVE (MOHVE), respectively. Empirical evaluations on MuJoCo tasks corroborate the theoretical claim. OPHVE outperforms other off-policy evaluation methods in all three metrics measuring the estimation effectiveness, while MOHVE achieves better or comparable performance with state-of-the-art offline reinforcement learning algorithms. We hope that HVE could shed some light on further research on reinforcement learning from fixed data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes