LG AIJun 5, 2024

DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays

Bo Xia, Yilun Kong, Yongzhe Chang, Bo Yuan, Zhiheng Li, Xueqian Wang, Bin Liang

arXiv:2406.03102v19.28 citations

Originality Incremental advance

AI Analysis

This addresses a specific challenge in reinforcement learning for tasks with delays, offering an incremental improvement over existing methods.

The paper tackles the problem of reinforcement learning in environments with variable delays, which break the Markov assumption, by proposing the DEER framework that uses a pretrained encoder to map delayed states and past actions into hidden states. The results show DEER outperforms state-of-the-art RL algorithms in both constant and random delay settings on Gym and Mujoco environments.

Classic reinforcement learning (RL) frequently confronts challenges in tasks involving delays, which cause a mismatch between received observations and subsequent actions, thereby deviating from the Markov assumption. Existing methods usually tackle this issue with end-to-end solutions using state augmentation. However, these black-box approaches often involve incomprehensible processes and redundant information in the information states, causing instability and potentially undermining the overall performance. To alleviate the delay challenges in RL, we propose $\textbf{DEER (Delay-resilient Encoder-Enhanced RL)}$, a framework designed to effectively enhance the interpretability and address the random delay issues. DEER employs a pretrained encoder to map delayed states, along with their variable-length past action sequences resulting from different delays, into hidden states, which is trained on delay-free environment datasets. In a variety of delayed scenarios, the trained encoder can seamlessly integrate with standard RL algorithms without requiring additional modifications and enhance the delay-solving capability by simply adapting the input dimension of the original algorithms. We evaluate DEER through extensive experiments on Gym and Mujoco environments. The results confirm that DEER is superior to state-of-the-art RL algorithms in both constant and random delay settings.

View on arXiv PDF

Similar