AI LG RONov 4, 2017

Guiding the search in continuous state-action spaces by learning an action sampling distribution from off-target samples

Beomjoon Kim, Leslie Pack Kaelbling, Tomas Lozano-Perez

arXiv:1711.01391v14.44 citations

Originality Incremental advance

AI Analysis

This addresses the problem of long-horizon planning in robotics, offering a more robust alternative to policy estimation, though it is incremental as it builds on existing GAN and importance-ratio methods.

The paper tackles the problem of inefficient planning in high-dimensional continuous state-action spaces for robotics by learning an action-sampling distribution to guide search, using a GAN with importance-ratio estimation to handle imbalanced data, and demonstrates effectiveness in three challenging robot planning problems.

In robotics, it is essential to be able to plan efficiently in high-dimensional continuous state-action spaces for long horizons. For such complex planning problems, unguided uniform sampling of actions until a path to a goal is found is hopelessly inefficient, and gradient-based approaches often fall short when the optimization manifold of a given problem is not smooth. In this paper we present an approach that guides the search of a state-space planner, such as A*, by learning an action-sampling distribution that can generalize across different instances of a planning problem. The motivation is that, unlike typical learning approaches for planning for continuous action space that estimate a policy, an estimated action sampler is more robust to error since it has a planner to fall back on. We use a Generative Adversarial Network (GAN), and address an important issue: search experience consists of a relatively large number of actions that are not on a solution path and a relatively small number of actions that actually are on a solution path. We introduce a new technique, based on an importance-ratio estimation method, for using samples from a non-target distribution to make GAN learning more data-efficient. We provide theoretical guarantees and empirical evaluation in three challenging continuous robot planning problems to illustrate the effectiveness of our algorithm.

View on arXiv PDF

Similar