LG RO MLJan 20, 2020

Reinforcement Learning with Probabilistically Complete Exploration

Philippe Morere, Gilad Francis, Tom Blau, Fabio Ramos

arXiv:2001.06940v14.23 citations

Originality Incremental advance

AI Analysis

This addresses the problem of inefficient exploration in RL for researchers and practitioners, offering a method that reduces sample complexity, though it is incremental as it builds on existing planning and RL techniques.

The paper tackles the challenge of high sample complexity in reinforcement learning, especially with sparse rewards, by proposing Rapidly Randomly-exploring Reinforcement Learning (R3L), which uses planning algorithms like RRT to find initial solutions as demonstrations, leading to faster convergence and better asymptotic performance with fewer exploration samples.

Balancing exploration and exploitation remains a key challenge in reinforcement learning (RL). State-of-the-art RL algorithms suffer from high sample complexity, particularly in the sparse reward case, where they can do no better than to explore in all directions until the first positive rewards are found. To mitigate this, we propose Rapidly Randomly-exploring Reinforcement Learning (R3L). We formulate exploration as a search problem and leverage widely-used planning algorithms such as Rapidly-exploring Random Tree (RRT) to find initial solutions. These solutions are used as demonstrations to initialize a policy, then refined by a generic RL algorithm, leading to faster and more stable convergence. We provide theoretical guarantees of R3L exploration finding successful solutions, as well as bounds for its sampling complexity. We experimentally demonstrate the method outperforms classic and intrinsic exploration techniques, requiring only a fraction of exploration samples and achieving better asymptotic performance.

View on arXiv PDF

Similar