Bounded Exploration with World Model Uncertainty in Soft Actor-Critic Reinforcement Learning Algorithm
This addresses the bottleneck of exploration for real-world DRL applications, though it appears incremental as it builds on existing methods like Soft Actor-Critic.
The paper tackled the problem of inefficient exploration in deep reinforcement learning by proposing bounded exploration, a method combining soft and intrinsic motivation exploration, which improved Soft Actor-Critic's performance and convergence speed, achieving the highest score in 6 out of 8 experiments.
One of the bottlenecks preventing Deep Reinforcement Learning algorithms (DRL) from real-world applications is how to explore the environment and collect informative transitions efficiently. The present paper describes bounded exploration, a novel exploration method that integrates both 'soft' and intrinsic motivation exploration. Bounded exploration notably improved the Soft Actor-Critic algorithm's performance and its model-based extension's converging speed. It achieved the highest score in 6 out of 8 experiments. Bounded exploration presents an alternative method to introduce intrinsic motivations to exploration when the original reward function has strict meanings.