MLLGMay 12, 2014

Structural Return Maximization for Reinforcement Learning

arXiv:1405.2606v11 citations
Originality Incremental advance
AI Analysis

This work addresses over-fitting issues in batch RL for practitioners dealing with limited data, though it appears incremental as it builds on existing statistical learning theory.

The paper tackles the problem of over-fitting in batch reinforcement learning when limited data is available, by using Structural Risk Minimization with Rademacher complexity to select appropriately sized policy classes, resulting in a method that maximizes a bound on return with weak assumptions on the system.

Batch Reinforcement Learning (RL) algorithms attempt to choose a policy from a designer-provided class of policies given a fixed set of training data. Choosing the policy which maximizes an estimate of return often leads to over-fitting when only limited data is available, due to the size of the policy class in relation to the amount of data available. In this work, we focus on learning policy classes that are appropriately sized to the amount of data available. We accomplish this by using the principle of Structural Risk Minimization, from Statistical Learning Theory, which uses Rademacher complexity to identify a policy class that maximizes a bound on the return of the best policy in the chosen policy class, given the available data. Unlike similar batch RL approaches, our bound on return requires only extremely weak assumptions on the true system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes