AIJul 12, 2024

Constrained Intrinsic Motivation for Reinforcement Learning

Xiang Zheng, Xingjun Ma, Chao Shen, Cong Wang

arXiv:2407.09247v19.66 citationsh-index: 12Has Code

Originality Incremental advance

AI Analysis

This work addresses sample inefficiency and suboptimality in reinforcement learning tasks using intrinsic motivation, representing an incremental improvement over existing methods.

This paper tackles two problems in reinforcement learning with intrinsic motivation: designing effective intrinsic objectives for reward-free pre-training and reducing bias in exploration tasks, proposing Constrained Intrinsic Motivation (CIM) to address these issues. The results show that CIM for RFPT surpasses fifteen existing methods in skill diversity, state coverage, and fine-tuning performance in MuJoCo environments.

This paper investigates two fundamental problems that arise when utilizing Intrinsic Motivation (IM) for reinforcement learning in Reward-Free Pre-Training (RFPT) tasks and Exploration with Intrinsic Motivation (EIM) tasks: 1) how to design an effective intrinsic objective in RFPT tasks, and 2) how to reduce the bias introduced by the intrinsic objective in EIM tasks. Existing IM methods suffer from static skills, limited state coverage, sample inefficiency in RFPT tasks, and suboptimality in EIM tasks. To tackle these problems, we propose \emph{Constrained Intrinsic Motivation (CIM)} for RFPT and EIM tasks, respectively: 1) CIM for RFPT maximizes the lower bound of the conditional state entropy subject to an alignment constraint on the state encoder network for efficient dynamic and diverse skill discovery and state coverage maximization; 2) CIM for EIM leverages constrained policy optimization to adaptively adjust the coefficient of the intrinsic objective to mitigate the distraction from the intrinsic objective. In various MuJoCo robotics environments, we empirically show that CIM for RFPT greatly surpasses fifteen IM methods for unsupervised skill discovery in terms of skill diversity, state coverage, and fine-tuning performance. Additionally, we showcase the effectiveness of CIM for EIM in redeeming intrinsic rewards when task rewards are exposed from the beginning. Our code is available at https://github.com/x-zheng16/CIM.

View on arXiv PDF Code

Similar