LGAIRONov 9, 2021

Safe Policy Optimization with Local Generalized Linear Function Approximations

arXiv:2111.04894v111 citations
Originality Incremental advance
AI Analysis

This addresses the problem of applying safe RL to large-scale real-world problems, representing an incremental improvement over existing methods with theoretical guarantees.

The paper tackles safe exploration in reinforcement learning for safety-critical systems by proposing SPO-LF, a novel algorithm that uses generalized linear function approximations to learn reward/safety relations from local sensor features. The results show it is more efficient in sample complexity and computational cost than previous theoretically-guaranteed methods and comparably sample-efficient and safer than advanced deep RL methods with safety constraints.

Safe exploration is a key to applying reinforcement learning (RL) in safety-critical systems. Existing safe exploration methods guaranteed safety under the assumption of regularity, and it has been difficult to apply them to large-scale real problems. We propose a novel algorithm, SPO-LF, that optimizes an agent's policy while learning the relation between a locally available feature obtained by sensors and environmental reward/safety using generalized linear function approximations. We provide theoretical guarantees on its safety and optimality. We experimentally show that our algorithm is 1) more efficient in terms of sample complexity and computational cost and 2) more applicable to large-scale problems than previous safe RL methods with theoretical guarantees, and 3) comparably sample-efficient and safer compared with existing advanced deep RL methods with safety constraints.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes