LGAIROJan 24, 2022

Constrained Policy Optimization via Bayesian World Models

arXiv:2201.09802v476 citations
AI Analysis

This addresses safety-critical tasks in real-world RL deployments, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles the problem of sample efficiency and safety in reinforcement learning for high-stakes applications by proposing LAMBDA, a model-based approach using Bayesian world models to optimize policies in constrained Markov decision processes, achieving state-of-the-art performance on the Safety-Gym benchmark in terms of sample efficiency and constraint violation.

Improving sample-efficiency and safety are crucial challenges when deploying reinforcement learning in high-stakes real world applications. We propose LAMBDA, a novel model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes. Our approach utilizes Bayesian world models, and harnesses the resulting uncertainty to maximize optimistic upper bounds on the task objective, as well as pessimistic upper bounds on the safety constraints. We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes