AIAug 21, 2024

Advances in Preference-based Reinforcement Learning: A Review

arXiv:2408.11943v119 citationsh-index: 51
Originality Synthesis-oriented
AI Analysis

This is a review paper that synthesizes existing work for researchers in reinforcement learning, providing an overview of PbRL's progress and challenges.

The paper reviews preference-based reinforcement learning (PbRL), which tackles the problem of RL's dependency on engineered reward functions by using human preferences as feedback, and it presents a unified framework covering recent advances, theoretical guarantees, and applications.

Reinforcement Learning (RL) algorithms suffer from the dependency on accurately engineered reward functions to properly guide the learning agents to do the required tasks. Preference-based reinforcement learning (PbRL) addresses that by utilizing human preferences as feedback from the experts instead of numeric rewards. Due to its promising advantage over traditional RL, PbRL has gained more focus in recent years with many significant advances. In this survey, we present a unified PbRL framework to include the newly emerging approaches that improve the scalability and efficiency of PbRL. In addition, we give a detailed overview of the theoretical guarantees and benchmarking work done in the field, while presenting its recent applications in complex real-world tasks. Lastly, we go over the limitations of the current approaches and the proposed future research directions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes