LGAIFeb 17, 2025

Maximum Entropy Reinforcement Learning with Diffusion Policy

arXiv:2502.11612v328 citationsh-index: 2Has CodeICML
Originality Incremental advance
AI Analysis

This work addresses the problem of insufficient exploration in complex multi-goal RL environments for researchers and practitioners, representing an incremental improvement by applying diffusion models within an existing MaxEnt RL framework.

The paper tackles the limited exploration capacity of Gaussian policies in Maximum Entropy Reinforcement Learning by using a diffusion model as a multimodal policy representation, resulting in improved performance on Mujoco benchmarks compared to Gaussian policies and competitive results with other state-of-the-art diffusion-based RL algorithms.

The Soft Actor-Critic (SAC) algorithm with a Gaussian policy has become a mainstream implementation for realizing the Maximum Entropy Reinforcement Learning (MaxEnt RL) objective, which incorporates entropy maximization to encourage exploration and enhance policy robustness. While the Gaussian policy performs well on simpler tasks, its exploration capacity and potential performance in complex multi-goal RL environments are limited by its inherent unimodality. In this paper, we employ the diffusion model, a powerful generative model capable of capturing complex multimodal distributions, as the policy representation to fulfill the MaxEnt RL objective, developing a method named MaxEnt RL with Diffusion Policy (MaxEntDP). Our method enables efficient exploration and brings the policy closer to the optimal MaxEnt policy. Experimental results on Mujoco benchmarks show that MaxEntDP outperforms the Gaussian policy and other generative models within the MaxEnt RL framework, and performs comparably to other state-of-the-art diffusion-based online RL algorithms. Our code is available at https://github.com/diffusionyes/MaxEntDP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes