LGAIROOct 17, 2024

ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization

arXiv:2410.13837v38 citationsh-index: 15ICLR
Originality Incremental advance
AI Analysis

This addresses a key bottleneck in RL for complex tasks by automating reward design, though it is incremental as it builds on existing reward shaping methods.

The paper tackles the challenge of efficiently selecting effective shaping reward functions in reinforcement learning for complex tasks, proposing ORSO which significantly reduces data requirements (up to 8 times faster) and outperforms prior methods by over 50% while matching expert-engineered rewards.

Reward shaping is critical in reinforcement learning (RL), particularly for complex tasks where sparse rewards can hinder learning. However, choosing effective shaping rewards from a set of reward functions in a computationally efficient manner remains an open challenge. We propose Online Reward Selection and Policy Optimization (ORSO), a novel approach that frames the selection of shaping reward function as an online model selection problem. ORSO automatically identifies performant shaping reward functions without human intervention with provable regret guarantees. We demonstrate ORSO's effectiveness across various continuous control tasks. Compared to prior approaches, ORSO significantly reduces the amount of data required to evaluate a shaping reward function, resulting in superior data efficiency and a significant reduction in computational time (up to 8 times). ORSO consistently identifies high-quality reward functions outperforming prior methods by more than 50% and on average identifies policies as performant as the ones learned using manually engineered reward functions by domain experts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes