LGFeb 6, 2025

Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning

Wesley A. Suttle, Aamodh Suresh, Carlos Nieto-Granda

arXiv:2502.04141v14.14 citationsh-index: 15ICLR

Originality Incremental advance

AI Analysis

This work addresses dataset generation for offline RL in complex domains, offering a novel method that improves performance over existing entropy-based approaches, though it is incremental as it builds on prior entropy concepts.

The authors tackled the problem of generating diverse datasets for offline reinforcement learning by proposing behavioral entropy as an exploration objective, showing that offline RL algorithms trained on datasets collected using behavioral entropy outperform those using Shannon entropy, SMM, and RND on all tasks and beat Rényi entropy on 80% of tasks.

Entropy-based objectives are widely used to perform state space exploration in reinforcement learning (RL) and dataset generation for offline RL. Behavioral entropy (BE), a rigorous generalization of classical entropies that incorporates cognitive and perceptual biases of agents, was recently proposed for discrete settings and shown to be a promising metric for robotic exploration problems. In this work, we propose using BE as a principled exploration objective for systematically generating datasets that provide diverse state space coverage in complex, continuous, potentially high-dimensional domains. To achieve this, we extend the notion of BE to continuous settings, derive tractable $k$-nearest neighbor estimators, provide theoretical guarantees for these estimators, and develop practical reward functions that can be used with standard RL methods to learn BE-maximizing policies. Using standard MuJoCo environments, we experimentally compare the performance of offline RL algorithms for a variety of downstream tasks on datasets generated using BE, Rényi, and Shannon entropy-maximizing policies, as well as the SMM and RND algorithms. We find that offline RL algorithms trained on datasets collected using BE outperform those trained on datasets collected using Shannon entropy, SMM, and RND on all tasks considered, and on 80% of the tasks compared to datasets collected using Rényi entropy.

View on arXiv PDF

Similar