LGJun 5, 2021

Learning Routines for Effective Off-Policy Reinforcement Learning

arXiv:2106.02943v1
Originality Incremental advance
AI Analysis

This addresses the challenge of user-defined action spaces in reinforcement learning, offering a more automated and efficient approach, though it is incremental as it builds on existing off-policy methods.

The paper tackles the problem of designing appropriate action spaces in reinforcement learning by introducing a routine space, where each routine represents equivalent sequences of granular actions, and shows that applying this framework to off-policy algorithms leads to performance improvements and reduced environment interactions per episode.

The performance of reinforcement learning depends upon designing an appropriate action space, where the effect of each action is measurable, yet, granular enough to permit flexible behavior. So far, this process involved non-trivial user choices in terms of the available actions and their execution frequency. We propose a novel framework for reinforcement learning that effectively lifts such constraints. Within our framework, agents learn effective behavior over a routine space: a new, higher-level action space, where each routine represents a set of 'equivalent' sequences of granular actions with arbitrary length. Our routine space is learned end-to-end to facilitate the accomplishment of underlying off-policy reinforcement learning objectives. We apply our framework to two state-of-the-art off-policy algorithms and show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode, improving computational efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes