LG AI NIJan 14, 2022

Demystifying Reinforcement Learning in Time-Varying Systems

Pouya Hamadanian, Malte Schwarzkopf, Siddartha Sen, Mohammad Alizadeh

arXiv:2201.05560v23.32 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of non-stationarity in reinforcement learning for real-world systems, offering a practical solution for domains like computing and streaming, though it is incremental in improving existing methods.

The paper tackled the problem of applying reinforcement learning to time-varying systems by developing a robust framework that identifies environments, triggers exploration, retains knowledge, and safeguards performance, showing its necessity and effectiveness in straggler mitigation and adaptive video streaming with real-world and synthetic data.

Recent research has turned to Reinforcement Learning (RL) to solve challenging decision problems, as an alternative to hand-tuned heuristics. RL can learn good policies without the need for modeling the environment's dynamics. Despite this promise, RL remains an impractical solution for many real-world systems problems. A particularly challenging case occurs when the environment changes over time, i.e. it exhibits non-stationarity. In this work, we characterize the challenges introduced by non-stationarity, shed light on the range of approaches to them and develop a robust framework for addressing them to train RL agents in live systems. Such agents must explore and learn new environments, without hurting the system's performance, and remember them over time. To this end, our framework (i) identifies different environments encountered by the live system, (ii) triggers exploration when necessary, (iii) takes precautions to retain knowledge from prior environments, and (iv) employs safeguards to protect the system's performance when the RL agent makes mistakes. We apply our framework to two systems problems, straggler mitigation and adaptive video streaming, and evaluate it against a variety of alternative approaches using real-world and synthetic data. We show that all components of the framework are necessary to cope with non-stationarity and provide guidance on alternative design choices for each component.

View on arXiv PDF

Similar