SY LGMar 18, 2024

Decomposing Control Lyapunov Functions for Efficient Reinforcement Learning

arXiv:2403.12210v14.35 citationsh-index: 21Has CodeACC

Originality Incremental advance

AI Analysis

This work addresses the problem of inefficient data collection in real-world robotics applications for researchers and practitioners, representing an incremental improvement over existing CLF-based methods.

The paper tackles the challenge of high sample complexity in reinforcement learning for robotics by introducing Decomposed Control Lyapunov Functions (DCLFs) to enable reward shaping, demonstrating that their method reduces the real-world data needed to land a quadcopter by over half compared to the state-of-the-art Soft-Actor Critic algorithm.

Recent methods using Reinforcement Learning (RL) have proven to be successful for training intelligent agents in unknown environments. However, RL has not been applied widely in real-world robotics scenarios. This is because current state-of-the-art RL methods require large amounts of data to learn a specific task, leading to unreasonable costs when deploying the agent to collect data in real-world applications. In this paper, we build from existing work that reshapes the reward function in RL by introducing a Control Lyapunov Function (CLF), which is demonstrated to reduce the sample complexity. Still, this formulation requires knowing a CLF of the system, but due to the lack of a general method, it is often a challenge to identify a suitable CLF. Existing work can compute low-dimensional CLFs via a Hamilton-Jacobi reachability procedure. However, this class of methods becomes intractable on high-dimensional systems, a problem that we address by using a system decomposition technique to compute what we call Decomposed Control Lyapunov Functions (DCLFs). We use the computed DCLF for reward shaping, which we show improves RL performance. Through multiple examples, we demonstrate the effectiveness of this approach, where our method finds a policy to successfully land a quadcopter in less than half the amount of real-world data required by the state-of-the-art Soft-Actor Critic algorithm.

View on arXiv PDF Code

Similar