LGAIJun 12, 2020

Safety-guaranteed Reinforcement Learning based on Multi-class Support Vector Machine

arXiv:2006.07446v1
Originality Incremental advance
AI Analysis

It addresses safety-critical applications by ensuring constraint satisfaction, though it is incremental as it builds on existing Q-learning and SVM methods.

The paper tackled the problem of satisfying hard state constraints in model-free reinforcement learning with deterministic dynamics, resulting in a policy that is guaranteed to converge to an optimal solution while adhering to constraints.

Several works have addressed the problem of incorporating constraints in the reinforcement learning (RL) framework, however majority of them can only guarantee the satisfaction of soft constraints. In this work, we address the problem of satisfying hard state constraints in a model-free RL setting with the deterministic system dynamics. The proposed algorithm is developed for the discrete state and action space and utilizes a multi-class support vector machine (SVM) to represent the policy. The state constraints are incorporated in the SVM optimization framework to derive an analytical solution for determining the policy parameters. This final policy converges to a solution which is guaranteed to satisfy the constraints. Additionally, the proposed formulation adheres to the Q-learning framework and thus, also guarantees convergence to the optimal solution. The algorithm is demonstrated with multiple example problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes