LGNov 23, 2021

Adaptive Multi-Goal Exploration

Jean Tarbouriech, Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Michal Valko, Alessandro Lazaric

arXiv:2111.12045v24.44 citations

Originality Highly original

AI Analysis

This addresses the challenge of learning goal-conditioned policies efficiently for researchers and practitioners in reinforcement learning, offering a novel theoretical framework with practical applications.

The paper tackles the problem of efficient multi-goal exploration in unknown environments by introducing AdaGoal, a goal selection scheme that adaptively targets goals based on uncertainty, resulting in a provably near-minimax optimal algorithm requiring O(L^3 S A ε^{-2}) exploration steps in tabular cases and extending to linear function approximation.

We introduce a generic strategy for provably efficient multi-goal exploration. It relies on AdaGoal, a novel goal selection scheme that leverages a measure of uncertainty in reaching states to adaptively target goals that are neither too difficult nor too easy. We show how AdaGoal can be used to tackle the objective of learning an $ε$-optimal goal-conditioned policy for the (initially unknown) set of goal states that are reachable within $L$ steps in expectation from a reference state $s_0$ in a reward-free Markov decision process. In the tabular case with $S$ states and $A$ actions, our algorithm requires $\tilde{O}(L^3 S A ε^{-2})$ exploration steps, which is nearly minimax optimal. We also readily instantiate AdaGoal in linear mixture Markov decision processes, yielding the first goal-oriented PAC guarantee with linear function approximation. Beyond its strong theoretical guarantees, we anchor AdaGoal in goal-conditioned deep reinforcement learning, both conceptually and empirically, by connecting its idea of selecting "uncertain" goals to maximizing value ensemble disagreement.

View on arXiv PDF

Similar