LGHCNov 30, 2022

Time-Efficient Reward Learning via Visually Assisted Cluster Ranking

Berkeley
arXiv:2212.00169v18 citationsh-index: 60
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient human feedback in reward learning for AI agents, offering an incremental improvement to existing comparison-based methods.

The paper tackled the bottleneck of expensive and time-consuming human comparison labeling in reward learning by batching comparisons using data visualization and an interactive GUI, resulting in greatly increased agent performance on simple Mujoco tasks with the same labeling time.

One of the most successful paradigms for reward learning uses human feedback in the form of comparisons. Although these methods hold promise, human comparison labeling is expensive and time consuming, constituting a major bottleneck to their broader applicability. Our insight is that we can greatly improve how effectively human time is used in these approaches by batching comparisons together, rather than having the human label each comparison individually. To do so, we leverage data dimensionality-reduction and visualization techniques to provide the human with a interactive GUI displaying the state space, in which the user can label subportions of the state space. Across some simple Mujoco tasks, we show that this high-level approach holds promise and is able to greatly increase the performance of the resulting agents, provided the same amount of human labeling time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes