LG AIJul 6, 2023

Offline Reinforcement Learning with Imbalanced Datasets

Li Jiang, Sijie Cheng, Jielin Qiu, Haoran Xu, Wai Kin Chan, Zhao Ding

Tsinghua

arXiv:2307.02752v38.86 citationsh-index: 16

Originality Highly original

AI Analysis

This work tackles the challenge of imbalanced real-world datasets for offline RL practitioners, offering a solution to improve policy extraction in skewed data environments.

The paper addresses the problem of imbalanced datasets in offline reinforcement learning, where state coverage follows a power law distribution, and shows that existing methods like CQL are ineffective; it proposes a novel method combining CQL with retrieval to recall past experiences, achieving superior performance on tasks with varying imbalance levels.

The prevalent use of benchmarks in current offline reinforcement learning (RL) research has led to a neglect of the imbalance of real-world dataset distributions in the development of models. The real-world offline RL dataset is often imbalanced over the state space due to the challenge of exploration or safety considerations. In this paper, we specify properties of imbalanced datasets in offline RL, where the state coverage follows a power law distribution characterized by skewed policies. Theoretically and empirically, we show that typically offline RL methods based on distributional constraints, such as conservative Q-learning (CQL), are ineffective in extracting policies under the imbalanced dataset. Inspired by natural intelligence, we propose a novel offline RL method that utilizes the augmentation of CQL with a retrieval process to recall past related experiences, effectively alleviating the challenges posed by imbalanced datasets. We evaluate our method on several tasks in the context of imbalanced datasets with varying levels of imbalance, utilizing the variant of D4RL. Empirical results demonstrate the superiority of our method over other baselines.

View on arXiv PDF

Similar