RO LGJan 2, 2023

A Policy Optimization Method Towards Optimal-time Stability

Shengjie Wang, Fengbo Lan, Xiang Zheng, Yuxue Cao, Oluwatosin Oseni, Haotian Xu, Tao Zhang, Yang Gao

arXiv:2301.00521v21.94 citationsh-index: 11

Originality Incremental advance

AI Analysis

This addresses stability issues in RL for robotics, offering a more efficient approach, though it appears incremental as it builds on existing Actor-Critic frameworks.

The paper tackles the problem of sub-optimal policies in model-free reinforcement learning due to infinite-time stability criteria, proposing a method that achieves optimal-time stability, resulting in significant performance improvements on ten robotic tasks.

In current model-free reinforcement learning (RL) algorithms, stability criteria based on sampling methods are commonly utilized to guide policy optimization. However, these criteria only guarantee the infinite-time convergence of the system's state to an equilibrium point, which leads to sub-optimality of the policy. In this paper, we propose a policy optimization technique incorporating sampling-based Lyapunov stability. Our approach enables the system's state to reach an equilibrium point within an optimal time and maintain stability thereafter, referred to as "optimal-time stability". To achieve this, we integrate the optimization method into the Actor-Critic framework, resulting in the development of the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic tasks, our approach outperforms previous studies significantly, effectively guiding the system to generate stable patterns.

View on arXiv PDF

Similar