LGAIDec 13, 2022

PPO-UE: Proximal Policy Optimization via Uncertainty-Aware Exploration

arXiv:2212.06343v12 citationsh-index: 92
Originality Incremental advance
AI Analysis

This addresses training stability and performance improvements for deep reinforcement learning practitioners, but it is incremental as it modifies an existing method.

The paper tackled the stability issue in Proximal Policy Optimization (PPO) caused by homogeneous exploration by proposing PPO-UE, a variant with self-adaptive uncertainty-aware exploration, which considerably outperformed baseline PPO in Roboschool continuous control tasks.

Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement learning (DRL) approach. However, we observe that the homogeneous exploration process in PPO could cause an unexpected stability issue in the training phase. To address this issue, we propose PPO-UE, a PPO variant equipped with self-adaptive uncertainty-aware explorations (UEs) based on a ratio uncertainty level. The proposed PPO-UE is designed to improve convergence speed and performance with an optimized ratio uncertainty level. Through extensive sensitivity analysis by varying the ratio uncertainty level, our proposed PPO-UE considerably outperforms the baseline PPO in Roboschool continuous control tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes