LGAIDec 17, 2025

Automatic Reward Shaping from Multi-Objective Human Heuristics

arXiv:2512.15120v1h-index: 7
Originality Incremental advance
AI Analysis

This addresses the problem of reward function design for researchers and practitioners in reinforcement learning, offering an incremental improvement by automating a manual tuning process.

The paper tackles the challenge of designing effective reward functions in multi-objective reinforcement learning by proposing MORSE, a framework that automatically combines human-designed heuristic rewards into a unified function, achieving task performance comparable to manually tuned rewards in robotic environments like MuJoCo and Isaac Sim.

Designing effective reward functions remains a central challenge in reinforcement learning, especially in multi-objective environments. In this work, we propose Multi-Objective Reward Shaping with Exploration (MORSE), a general framework that automatically combines multiple human-designed heuristic rewards into a unified reward function. MORSE formulates the shaping process as a bi-level optimization problem: the inner loop trains a policy to maximize the current shaped reward, while the outer loop updates the reward function to optimize task performance. To encourage exploration in the reward space and avoid suboptimal local minima, MORSE introduces stochasticity into the shaping process, injecting noise guided by task performance and the prediction error of a fixed, randomly initialized neural network. Experimental results in MuJoCo and Isaac Sim environments show that MORSE effectively balances multiple objectives across various robotic tasks, achieving task performance comparable to those obtained with manually tuned reward functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes