LGAIJan 13, 2025

Performance Optimization of Ratings-Based Reinforcement Learning

arXiv:2501.07755v11 citationsh-index: 23
Originality Synthesis-oriented
AI Analysis

It addresses the sensitivity of RbRL to hyperparameters for researchers and practitioners, but is incremental as it focuses on optimization rather than introducing new methods.

This paper investigates optimization methods to enhance the performance of rating-based reinforcement learning (RbRL), which infers reward functions from human ratings for policy learning, by analyzing the impact of hyperparameters and providing guidelines for their selection.

This paper explores multiple optimization methods to improve the performance of rating-based reinforcement learning (RbRL). RbRL, a method based on the idea of human ratings, has been developed to infer reward functions in reward-free environments for the subsequent policy learning via standard reinforcement learning, which requires the availability of reward functions. Specifically, RbRL minimizes the cross entropy loss that quantifies the differences between human ratings and estimated ratings derived from the inferred reward. Hence, a low loss means a high degree of consistency between human ratings and estimated ratings. Despite its simple form, RbRL has various hyperparameters and can be sensitive to various factors. Therefore, it is critical to provide comprehensive experiments to understand the impact of various hyperparameters on the performance of RbRL. This paper is a work in progress, providing users some general guidelines on how to select hyperparameters in RbRL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes