LGAIMLMar 23, 2017

Unsupervised Basis Function Adaptation for Reinforcement Learning

arXiv:1703.07940v31 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving reinforcement learning efficiency for researchers and practitioners by providing a computationally lightweight method to adapt approximation architectures, though it appears incremental as it builds on existing state aggregation and SARSA methods.

The paper tackles the challenge of determining suitable approximation architectures for value functions in reinforcement learning by introducing an unsupervised algorithm that adapts state aggregation based on state visitation frequency. The results show that the algorithm can significantly boost RL performance under commonly encountered conditions, with theoretical and experimental validation.

When using reinforcement learning (RL) algorithms it is common, given a large state space, to introduce some form of approximation architecture for the value function (VF). The exact form of this architecture can have a significant effect on an agent's performance, however, and determining a suitable approximation architecture can often be a highly complex task. Consequently there is currently interest among researchers in the potential for allowing RL algorithms to adaptively generate (i.e. to learn) approximation architectures. One relatively unexplored method of adapting approximation architectures involves using feedback regarding the frequency with which an agent has visited certain states to guide which areas of the state space to approximate with greater detail. In this article we will: (a) informally discuss the potential advantages offered by such methods; (b) introduce a new algorithm based on such methods which adapts a state aggregation approximation architecture on-line and is designed for use in conjunction with SARSA; (c) provide theoretical results, in a policy evaluation setting, regarding this particular algorithm's complexity, convergence properties and potential to reduce VF error; and finally (d) test experimentally the extent to which this algorithm can improve performance given a number of different test problems. Taken together our results suggest that our algorithm (and potentially such methods more generally) can provide a versatile and computationally lightweight means of significantly boosting RL performance given suitable conditions which are commonly encountered in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes