LGROMLMay 12, 2018

Task Transfer by Preference-Based Cost Learning

arXiv:1805.04686v356 citations
Originality Incremental advance
AI Analysis

This addresses the inconvenience of obtaining expert demonstrations or cost functions in robotic action planning, though it appears incremental as it builds on existing methods like Adversarial MaxEnt IRL.

The paper tackles the problem of task transfer in reinforcement learning by relaxing the need for exactly-relevant expert demonstrations or explicitly-coded cost functions, instead using expert preference as guidance; it achieves this through a novel framework that alternates between preference-based selection and cost learning, with extensive simulations verifying effectiveness.

The goal of task transfer in reinforcement learning is migrating the action policy of an agent to the target task from the source task. Given their successes on robotic action planning, current methods mostly rely on two requirements: exactly-relevant expert demonstrations or the explicitly-coded cost function on target task, both of which, however, are inconvenient to obtain in practice. In this paper, we relax these two strong conditions by developing a novel task transfer framework where the expert preference is applied as a guidance. In particular, we alternate the following two steps: Firstly, letting experts apply pre-defined preference rules to select related expert demonstrates for the target task. Secondly, based on the selection result, we learn the target cost function and trajectory distribution simultaneously via enhanced Adversarial MaxEnt IRL and generate more trajectories by the learned target distribution for the next preference selection. The theoretical analysis on the distribution learning and convergence of the proposed algorithm are provided. Extensive simulations on several benchmarks have been conducted for further verifying the effectiveness of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes