ROLGMar 7, 2023

A Multiplicative Value Function for Safe and Efficient Reinforcement Learning

arXiv:2303.04118v12 citationsh-index: 191Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of deploying RL agents in real-world environments where safety constraints are critical, though it appears incremental as it builds on existing RL algorithms.

The paper tackles the problem of safe reinforcement learning by proposing a novel multiplicative value function that separates safety and reward critics, resulting in improved sample efficiency and successful zero-shot sim-to-real transfer in robot navigation tasks.

An emerging field of sequential decision problems is safe Reinforcement Learning (RL), where the objective is to maximize the reward while obeying safety constraints. Being able to handle constraints is essential for deploying RL agents in real-world environments, where constraint violations can harm the agent and the environment. To this end, we propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. By splitting responsibilities, we facilitate the learning task leading to increased sample efficiency. We integrate our approach into two popular RL algorithms, Proximal Policy Optimization and Soft Actor-Critic, and evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations. Finally, we make the zero-shot sim-to-real transfer where a differential drive robot has to navigate through a cluttered room. Our code can be found at https://github.com/nikeke19/Safe-Mult-RL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes