LG MLSep 16, 2021

Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

Sarah Rathnam, Susan A. Murphy, Finale Doshi-Velez

arXiv:2109.08134v13.11 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of model inaccuracies in batch reinforcement learning for researchers and practitioners, but it is incremental as it focuses on comparing and unifying existing methods rather than introducing a new paradigm.

The paper tackles the problem of poorly learned models and policies in batch reinforcement learning due to poorly explored state-action pairs by unifying three regularization methods into a common weighted average transition matrix framework, and confirms the intuitions from this framework through empirical evaluation across various MDPs and data collection policies.

In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to compare them. This paper unifies three regularization methods in a common framework -- a weighted average transition matrix. Considering regularization methods in this common form illuminates how the MDP structure and the state-action pair distribution of the batch data set influence the relative performance of regularization methods. We confirm intuitions generated from the common framework by empirical evaluation across a range of MDPs and data collection policies.

View on arXiv PDF

Similar