AIFeb 21, 2024

Composing Reinforcement Learning Policies, with Formal Guarantees

arXiv:2402.13785v23 citationsh-index: 4AAMAS
Originality Incremental advance
AI Analysis

This work addresses the challenge of scalable and reusable controller design in complex environments for robotics and AI systems, offering formal guarantees, though it builds incrementally on existing synthesis and reinforcement learning techniques.

The paper tackles the problem of designing controllers for environments with a two-level structure by proposing a framework that separates high-level planning using reactive synthesis and low-level policy training via reinforcement learning, resulting in formal guarantees on performance and abstraction quality, demonstrated in challenging navigation tasks with moving obstacles and visual inputs.

We propose a novel framework to controller design in environments with a two-level structure: a known high-level graph ("map") in which each vertex is populated by a Markov decision process, called a "room". The framework "separates concerns" by using different design techniques for low- and high-level tasks. We apply reactive synthesis for high-level tasks: given a specification as a logical formula over the high-level graph and a collection of low-level policies obtained together with "concise" latent structures, we construct a "planner" that selects which low-level policy to apply in each room. We develop a reinforcement learning procedure to train low-level policies on latent structures, which unlike previous approaches, circumvents a model distillation step. We pair the policy with probably approximately correct guarantees on its performance and on the abstraction quality, and lift these guarantees to the high-level task. These formal guarantees are the main advantage of the framework. Other advantages include scalability (rooms are large and their dynamics are unknown) and reusability of low-level policies. We demonstrate feasibility in challenging case studies where an agent navigates environments with moving obstacles and visual inputs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes