LGAIROMLOct 10, 2018

Batch Active Preference-Based Learning of Reward Functions

arXiv:1810.04303v1137 citations
Originality Incremental advance
AI Analysis

This work addresses data efficiency for robotics applications, but it appears incremental as it builds on existing active and preference-based learning methods.

The paper tackles the problem of efficiently learning reward functions in robotics by developing a batch active preference-based learning algorithm that reduces data samples and query generation times, achieving results with only a few queries computed quickly in simulation tasks.

Data generation and labeling are usually an expensive part of learning for robotics. While active learning methods are commonly used to tackle the former problem, preference-based learning is a concept that attempts to solve the latter by querying users with preference questions. In this paper, we will develop a new algorithm, batch active preference-based learning, that enables efficient learning of reward functions using as few data samples as possible while still having short query generation times. We introduce several approximations to the batch active learning problem, and provide theoretical guarantees for the convergence of our algorithms. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We then showcase our algorithm in a study to learn human users' preferences.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes