ROLGOct 22, 2024

Benchmarking Smoothness and Reducing High-Frequency Oscillations in Continuous Control Policies

arXiv:2410.16632v10.027 citationsh-index: 5IROS
AI Analysis25

This work addresses a practical issue for deploying RL policies in robotics and real-world applications, but it is incremental as it builds on existing methods.

The paper tackles the problem of high-frequency oscillations in reinforcement learning policies, which are undesirable for real-world hardware deployment, by benchmarking and proposing hybrid methods that combine loss regularization and architectural approaches, resulting in a 26.8% improvement in control smoothness with minimal performance degradation.

Reinforcement learning (RL) policies are prone to high-frequency oscillations, especially undesirable when deploying to hardware in the real-world. In this paper, we identify, categorize, and compare methods from the literature that aim to mitigate high-frequency oscillations in deep RL. We define two broad classes: loss regularization and architectural methods. At their core, these methods incentivize learning a smooth mapping, such that nearby states in the input space produce nearby actions in the output space. We present benchmarks in terms of policy performance and control smoothness on traditional RL environments from the Gymnasium and a complex manipulation task, as well as three robotics locomotion tasks that include deployment and evaluation with real-world hardware. Finally, we also propose hybrid methods that combine elements from both loss regularization and architectural methods. We find that the best-performing hybrid outperforms other methods, and improves control smoothness by 26.8% over the baseline, with a worst-case performance degradation of just 2.8%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes