LGJan 24, 2022

STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence

arXiv:2201.09857v5
AI Analysis

This addresses the problem of safe policy optimization in robotics for researchers and practitioners, though it appears incremental as it builds on existing actor-critic and policy gradient methods.

The paper tackles the challenge of deploying risk-averse reinforcement learning by proposing STOPS, which uses short-term trajectories to avoid hazardous states and achieves global optimality with a sublinear convergence rate, matching state-of-the-art risk-neutral methods in Mujoco simulations.

It remains challenging to deploy existing risk-averse approaches to real-world applications. The reasons are multi-fold, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes Short-Term VOlatility-controlled Policy Search (STOPS), a novel algorithm that solves risk-averse problems by learning from short-term trajectories instead of long-term trajectories. Short-term trajectories are more flexible to generate, and can avoid the danger of hazardous state visitations. By using an actor-critic scheme with an overparameterized two-layer neural network, our algorithm finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy gradient, with effectiveness comparable to the state-of-the-art convergence rate of risk-neutral policy-search methods. The algorithm is evaluated on challenging Mujoco robot simulation tasks under the mean-variance evaluation metric. Both theoretical analysis and experimental results demonstrate a state-of-the-art level of STOPS' performance among existing risk-averse policy search methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes