RO LGJul 18, 2019

Composing Diverse Policies for Temporally Extended Tasks

Daniel Angelov, Yordan Hristov, Michael Burke, Subramanian Ramamoorthy

arXiv:1907.08199v318 citations

Originality Incremental advance

AI Analysis

This addresses the combinatorial difficulty in hierarchical motion planning for robots performing sequenced tasks, offering a solution for integrating controllers with different information streams, time scales, and action spaces, though it appears incremental as it builds on existing hierarchical and planning methods.

The paper tackles the challenge of composing diverse robot control policies for complex temporally extended tasks by introducing a method that sequences motion planning trajectories, dynamic motion primitives, and neural network controllers using a global goal scoring estimator and expert demonstrations. It demonstrates robustness in an MDP benchmark and solves a physical gear assembly task on a PR2 robot efficiently.

Robot control policies for temporally extended and sequenced tasks are often characterized by discontinuous switches between different local dynamics. These change-points are often exploited in hierarchical motion planning to build approximate models and to facilitate the design of local, region-specific controllers. However, it becomes combinatorially challenging to implement such a pipeline for complex temporally extended tasks, especially when the sub-controllers work on different information streams, time scales and action spaces. In this paper, we introduce a method that can compose diverse policies comprising motion planning trajectories, dynamic motion primitives and neural network controllers. We introduce a global goal scoring estimator that uses local, per-motion primitive dynamics models and corresponding activation state-space sets to sequence diverse policies in a locally optimal fashion. We use expert demonstrations to convert what is typically viewed as a gradient-based learning process into a planning process without explicitly specifying pre- and post-conditions. We first illustrate the proposed framework using an MDP benchmark to showcase robustness to action and model dynamics mismatch, and then with a particularly complex physical gear assembly task, solved on a PR2 robot. We show that the proposed approach successfully discovers the optimal sequence of controllers and solves both tasks efficiently.

View on arXiv PDF

Similar