AIJun 11, 2015

Bootstrapping Skills

Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

arXiv:1506.03624v12.9

Originality Incremental advance

AI Analysis

This addresses computational inefficiency in reinforcement learning for large domains, though it appears incremental as it builds on existing skill-based methods.

The paper tackles the problem of inefficient policy learning in complex Markov Decision Processes by proposing a method that combines simpler, parameterized skills instead of using a monolithic policy representation, and demonstrates that this approach can solve MDPs that monolithic methods cannot with the same representational power.

The monolithic approach to policy representation in Markov Decision Processes (MDPs) looks for a single policy that can be represented as a function from states to actions. For the monolithic approach to succeed (and this is not always possible), a complex feature representation is often necessary since the policy is a complex object that has to prescribe what actions to take all over the state space. This is especially true in large domains with complicated dynamics. It is also computationally inefficient to both learn and plan in MDPs using a complex monolithic approach. We present a different approach where we restrict the policy space to policies that can be represented as combinations of simpler, parameterized skills---a type of temporally extended action, with a simple policy representation. We introduce Learning Skills via Bootstrapping (LSB) that can use a broad family of Reinforcement Learning (RL) algorithms as a "black box" to iteratively learn parametrized skills. Initially, the learned skills are short-sighted but each iteration of the algorithm allows the skills to bootstrap off one another, improving each skill in the process. We prove that this bootstrapping process returns a near-optimal policy. Furthermore, our experiments demonstrate that LSB can solve MDPs that, given the same representational power, could not be solved by a monolithic approach. Thus, planning with learned skills results in better policies without requiring complex policy representations.

View on arXiv PDF

Similar