LGDec 24, 2022

An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context

Xiaoyu Chen, Xiangming Zhu, Yufeng Zheng, Pushi Zhang, Li Zhao, Wenxue Cheng, Peng Cheng, Yongqiang Xiong, Tao Qin, Jianyu Chen, Tie-Yan Liu

arXiv:2212.12735v113.622 citationsh-index: 91

Originality Incremental advance

AI Analysis

This addresses a key challenge for deploying RL in real-world applications like robotics and network control, where contexts change unpredictably, but the approach is incremental as it builds on prior context adaptation methods.

The paper tackles the problem of adapting reinforcement learning to environments with piecewise-stable contexts that change abruptly within episodes, proposing the SeCBAD method, which demonstrates improved performance on grid world and Mujoco tasks compared to existing methods.

One of the key challenges in deploying RL to real-world applications is to adapt to variations of unknown environment contexts, such as changing terrains in robotic tasks and fluctuated bandwidth in congestion control. Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian. However, in many real-world applications, the environment context usually stays stable for a stochastic period and then changes in an abrupt and unpredictable manner within an episode, resulting in a segment structure, which existing works fail to address. To leverage the segment structure of piecewise stable context in real-world applications, in this paper, we propose a \textit{\textbf{Se}gmented \textbf{C}ontext \textbf{B}elief \textbf{A}ugmented \textbf{D}eep~(SeCBAD)} RL method. Our method can jointly infer the belief distribution over latent context with the posterior over segment length and perform more accurate belief context inference with observed data within the current context segment. The inferred belief context can be leveraged to augment the state, leading to a policy that can adapt to abrupt variations in context. We demonstrate empirically that SeCBAD can infer context segment length accurately and outperform existing methods on a toy grid world environment and Mujuco tasks with piecewise-stable context.

View on arXiv PDF

Similar