LGJun 13, 2024

DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning

Xuemin Hu, Shen Li, Yingfen Xu, Bo Tang, Long Chen

arXiv:2406.09089v12.6

Originality Incremental advance

AI Analysis

This work addresses the problem of learning optimal policies from offline datasets for RL practitioners, offering an incremental improvement over existing GAN-based methods.

The paper tackles the extrapolation error issue in offline reinforcement learning by proposing DiffPoGAN, which uses diffusion models as policy generators and introduces regularization methods to approximate behavior policies and constrain exploration, achieving state-of-the-art performance on D4RL datasets.

Offline reinforcement learning (RL) can learn optimal policies from pre-collected offline datasets without interacting with the environment, but the sampled actions of the agent cannot often cover the action distribution under a given state, resulting in the extrapolation error issue. Recent works address this issue by employing generative adversarial networks (GANs). However, these methods often suffer from insufficient constraints on policy exploration and inaccurate representation of behavior policies. Moreover, the generator in GANs fails in fooling the discriminator while maximizing the expected returns of a policy. Inspired by the diffusion, a generative model with powerful feature expressiveness, we propose a new offline RL method named Diffusion Policies with Generative Adversarial Networks (DiffPoGAN). In this approach, the diffusion serves as the policy generator to generate diverse distributions of actions, and a regularization method based on maximum likelihood estimation (MLE) is developed to generate data that approximate the distribution of behavior policies. Besides, we introduce an additional regularization term based on the discriminator output to effectively constrain policy exploration for policy improvement. Comprehensive experiments are conducted on the datasets for deep data-driven reinforcement learning (D4RL), and experimental results show that DiffPoGAN outperforms state-of-the-art methods in offline RL.

View on arXiv PDF

Similar