ROLGJan 2, 2023

A Policy Optimization Method Towards Optimal-time Stability

arXiv:2301.00521v24 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses stability issues in RL for robotics, offering a more efficient approach, though it appears incremental as it builds on existing Actor-Critic frameworks.

The paper tackles the problem of sub-optimal policies in model-free reinforcement learning due to infinite-time stability criteria, proposing a method that achieves optimal-time stability, resulting in significant performance improvements on ten robotic tasks.

In current model-free reinforcement learning (RL) algorithms, stability criteria based on sampling methods are commonly utilized to guide policy optimization. However, these criteria only guarantee the infinite-time convergence of the system's state to an equilibrium point, which leads to sub-optimality of the policy. In this paper, we propose a policy optimization technique incorporating sampling-based Lyapunov stability. Our approach enables the system's state to reach an equilibrium point within an optimal time and maintain stability thereafter, referred to as "optimal-time stability". To achieve this, we integrate the optimization method into the Actor-Critic framework, resulting in the development of the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic tasks, our approach outperforms previous studies significantly, effectively guiding the system to generate stable patterns.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes