LGAIFLSYApr 25, 2023

Fulfilling Formal Specifications ASAP by Model-free Reinforcement Learning

arXiv:2304.12508v13 citationsh-index: 68
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient specification fulfillment in reinforcement learning, though it appears incremental as it builds on existing actor-critic methods.

The paper tackles the problem of encouraging agents to fulfill formal specifications as soon as possible using a model-free reinforcement learning framework, achieving success in finding sufficiently fast trajectories for up to 97% of test cases and outperforming baselines.

We propose a model-free reinforcement learning solution, namely the ASAP-Phi framework, to encourage an agent to fulfill a formal specification ASAP. The framework leverages a piece-wise reward function that assigns quantitative semantic reward to traces not satisfying the specification, and a high constant reward to the remaining. Then, it trains an agent with an actor-critic-based algorithm, such as soft actor-critic (SAC), or deep deterministic policy gradient (DDPG). Moreover, we prove that ASAP-Phi produces policies that prioritize fulfilling a specification ASAP. Extensive experiments are run, including ablation studies, on state-of-the-art benchmarks. Results show that our framework succeeds in finding sufficiently fast trajectories for up to 97\% test cases and defeats baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes