LGSYJun 17, 2024

Optimal Transport-Assisted Risk-Sensitive Q-Learning

arXiv:2406.11774v23 citations
Originality Incremental advance
AI Analysis

This work addresses safety concerns in reinforcement learning for applications where avoiding risky states is critical, though it appears incremental as it builds on existing Q-learning with optimal transport integration.

The paper tackled the problem of developing safer reinforcement learning policies by proposing a risk-sensitive Q-learning algorithm that integrates optimal transport theory to minimize visits to risky states while optimizing expected returns. The results showed significant reductions in risky state visits and faster convergence compared to traditional Q-learning in a Gridworld environment.

The primary goal of reinforcement learning is to develop decision-making policies that prioritize optimal performance without considering risk or safety. In contrast, safe reinforcement learning aims to mitigate or avoid unsafe states. This paper presents a risk-sensitive Q-learning algorithm that leverages optimal transport theory to enhance the agent safety. By integrating optimal transport into the Q-learning framework, our approach seeks to optimize the policy's expected return while minimizing the Wasserstein distance between the policy's stationary distribution and a predefined risk distribution, which encapsulates safety preferences from domain experts. We validate the proposed algorithm in a Gridworld environment. The results indicate that our method significantly reduces the frequency of visits to risky states and achieves faster convergence to a stable policy compared to the traditional Q-learning algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes