LGAIMar 13, 2025

Enhance Exploration in Safe Reinforcement Learning with Contrastive Representation Learning

arXiv:2503.10318v1h-index: 9
Originality Incremental advance
AI Analysis

This work addresses exploration challenges in safe RL for sparse-reward environments, but it is incremental as it builds on existing domain transfer methods with a novel representation learning approach.

The paper tackled the problem of inadequate exploration in safe reinforcement learning due to false positives from prior Q-functions, by learning a contrastive state representation to distinguish safe and unsafe states. The result showed improved exploration in three MiniGrid navigation environments while balancing safety and efficiency.

In safe reinforcement learning, agent needs to balance between exploration actions and safety constraints. Following this paradigm, domain transfer approaches learn a prior Q-function from the related environments to prevent unsafe actions. However, because of the large number of false positives, some safe actions are never executed, leading to inadequate exploration in sparse-reward environments. In this work, we aim to learn an efficient state representation to balance the exploration and safety-prefer action in a sparse-reward environment. Firstly, the image input is mapped to latent representation by an auto-encoder. A further contrastive learning objective is employed to distinguish safe and unsafe states. In the learning phase, the latent distance is used to construct an additional safety check, which allows the agent to bias the exploration if it visits an unsafe state. To verify the effectiveness of our method, the experiment is carried out in three navigation-based MiniGrid environments. The result highlights that our method can explore the environment better while maintaining a good balance between safety and efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes