LG AIMay 23, 2024

Offline Reinforcement Learning from Datasets with Structured Non-Stationarity

Johannes Ackermann, Takayuki Osa, Masashi Sugiyama

arXiv:2405.14114v210.45 citationsh-index: 5Has CodeRLJ

Originality Highly original

AI Analysis

This addresses a novel offline RL setting for researchers, offering incremental improvements by handling structured non-stationarity in datasets.

The paper tackled the problem of offline reinforcement learning with datasets exhibiting structured non-stationarity, where transition and reward functions change gradually between episodes but remain constant within each episode, and proposed a method based on Contrastive Predictive Coding that identifies and accounts for this non-stationarity, achieving oracle performance and outperforming baselines in continuous control and locomotion tasks.

Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy. Offline RL aims to solve this issue by using transitions collected by a different behavior policy. We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode. We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation. We analyze our proposed method and show that it performs well in simple continuous control tasks and challenging, high-dimensional locomotion tasks. We show that our method often achieves the oracle performance and performs better than baselines.

View on arXiv PDF Code

Similar