LGAINov 30, 2022

Computationally Efficient Reinforcement Learning: Targeted Exploration leveraging Simple Rules

arXiv:2211.16691v3h-index: 17
Originality Incremental advance
AI Analysis

This addresses the problem of slow convergence in reinforcement learning for practitioners in control domains, though it is incremental as it modifies existing frameworks with simple rules.

The paper tackles the poor sample complexity in model-free reinforcement learning by incorporating expert-designed rules into actor-critic frameworks to avoid suboptimal state-action regions, resulting in agents converging up to 6-7x faster on a room temperature control case study while maintaining good performance.

Model-free Reinforcement Learning (RL) generally suffers from poor sample complexity, mostly due to the need to exhaustively explore the state-action space to find well-performing policies. On the other hand, we postulate that expert knowledge of the system often allows us to design simple rules we expect good policies to follow at all times. In this work, we hence propose a simple yet effective modification of continuous actor-critic frameworks to incorporate such rules and avoid regions of the state-action space that are known to be suboptimal, thereby significantly accelerating the convergence of RL agents. Concretely, we saturate the actions chosen by the agent if they do not comply with our intuition and, critically, modify the gradient update step of the policy to ensure the learning process is not affected by the saturation step. On a room temperature control case study, it allows agents to converge to well-performing policies up to 6-7x faster than classical agents without computational overhead and while retaining good final performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes