LGAIMay 23, 2024

Offline Reinforcement Learning from Datasets with Structured Non-Stationarity

arXiv:2405.14114v25 citationsh-index: 5RLJ
Originality Highly original
AI Analysis

This addresses a novel offline RL setting for researchers, offering incremental improvements by handling structured non-stationarity in datasets.

The paper tackled the problem of offline reinforcement learning with datasets exhibiting structured non-stationarity, where transition and reward functions change gradually between episodes but remain constant within each episode, and proposed a method based on Contrastive Predictive Coding that identifies and accounts for this non-stationarity, achieving oracle performance and outperforming baselines in continuous control and locomotion tasks.

Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy. Offline RL aims to solve this issue by using transitions collected by a different behavior policy. We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode. We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation. We analyze our proposed method and show that it performs well in simple continuous control tasks and challenging, high-dimensional locomotion tasks. We show that our method often achieves the oracle performance and performs better than baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes