RO AI LGJun 9, 2023

Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation

Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha

arXiv:2306.06192v93.64 citationsh-index: 102

Originality Incremental advance

AI Analysis

This work addresses sample inefficiency in RL for robotic navigation, offering a practical solution for real-world deployment, though it is incremental as it builds on existing RL methods.

The paper tackles the problem of inefficient exploration in reinforcement learning for robotic navigation with sparse rewards by introducing Confidence-Controlled Exploration (CCE), which dynamically adjusts trajectory length based on policy entropy, resulting in an 18% higher success rate, 20-38% shorter paths, and 9.32% lower elevation costs compared to baselines.

Reinforcement learning (RL) is a promising approach for robotic navigation, allowing robots to learn through trial and error. However, real-world robotic tasks often suffer from sparse rewards, leading to inefficient exploration and suboptimal policies due to sample inefficiency of RL. In this work, we introduce Confidence-Controlled Exploration (CCE), a novel method that improves sample efficiency in RL-based robotic navigation without modifying the reward function. Unlike existing approaches, such as entropy regularization and reward shaping, which can introduce instability by altering rewards, CCE dynamically adjusts trajectory length based on policy entropy. Specifically, it shortens trajectories when uncertainty is high to enhance exploration and extends them when confidence is high to prioritize exploitation. CCE is a principled and practical solution inspired by a theoretical connection between policy entropy and gradient estimation. It integrates seamlessly with on-policy and off-policy RL methods and requires minimal modifications. We validate CCE across REINFORCE, PPO, and SAC in both simulated and real-world navigation tasks. CCE outperforms fixed-trajectory and entropy-regularized baselines, achieving an 18\% higher success rate, 20-38\% shorter paths, and 9.32\% lower elevation costs under a fixed training sample budget. Finally, we deploy CCE on a Clearpath Husky robot, demonstrating its effectiveness in complex outdoor environments.

View on arXiv PDF

Similar