ROLGFeb 15, 2022

L2C2: Locally Lipschitz Continuous Constraint towards Stable and Smooth Reinforcement Learning

arXiv:2202.07152v135 citations
Originality Incremental advance
AI Analysis

This addresses stability issues in reinforcement learning for applications like robotics, but it is incremental as it builds on existing regularization methods.

The paper tackles instability and noise sensitivity in reinforcement learning by proposing L2C2, a regularization technique that enforces local Lipschitz continuity to achieve moderate smoothness in policy and value functions without losing expressiveness, and numerical simulations show it improves task performance and smooths robot actions.

This paper proposes a new regularization technique for reinforcement learning (RL) towards making policy and value functions smooth and stable. RL is known for the instability of the learning process and the sensitivity of the acquired policy to noise. Several methods have been proposed to resolve these problems, and in summary, the smoothness of policy and value functions learned mainly in RL contributes to these problems. However, if these functions are extremely smooth, their expressiveness would be lost, resulting in not obtaining the global optimal solution. This paper therefore considers RL under local Lipschitz continuity constraint, so-called L2C2. By designing the spatio-temporal locally compact space for L2C2 from the state transition at each time step, the moderate smoothness can be achieved without loss of expressiveness. Numerical noisy simulations verified that the proposed L2C2 outperforms the task performance while smoothing out the robot action generated from the learned policy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes