LGSep 27, 2022

DCE: Offline Reinforcement Learning With Double Conservative Estimates

arXiv:2209.13132v11 citationsh-index: 66
Originality Incremental advance
AI Analysis

This addresses a key challenge in offline RL for applications where data collection is costly or risky, though it appears incremental as it builds on existing conservative estimation methods.

The paper tackles the problem of overestimation of out-of-distribution actions in offline reinforcement learning by proposing DCE, a method using double conservative estimates with a V-function and controllable penalty, achieving state-of-the-art performance on the D4RL benchmark.

Offline Reinforcement Learning has attracted much interest in solving the application challenge for traditional reinforcement learning. Offline reinforcement learning uses previously-collected datasets to train agents without any interaction. For addressing the overestimation of OOD (out-of-distribution) actions, conservative estimates give a low value for all inputs. Previous conservative estimation methods are usually difficult to avoid the impact of OOD actions on Q-value estimates. In addition, these algorithms usually need to lose some computational efficiency to achieve the purpose of conservative estimation. In this paper, we propose a simple conservative estimation method, double conservative estimates (DCE), which use two conservative estimation method to constraint policy. Our algorithm introduces V-function to avoid the error of in-distribution action while implicit achieving conservative estimation. In addition, our algorithm uses a controllable penalty term changing the degree of conservatism in training. We theoretically show how this method influences the estimation of OOD actions and in-distribution actions. Our experiment separately shows that two conservative estimation methods impact the estimation of all state-action. DCE demonstrates the state-of-the-art performance on D4RL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes