LGJul 18, 2025

Preference-based Multi-Objective Reinforcement Learning

arXiv:2507.14066v113 citationsh-index: 4IEEE Trans Autom Sci Eng
Originality Incremental advance
AI Analysis

This work addresses the challenge of balancing conflicting objectives in MORL for applications like energy management and autonomous driving, offering a more flexible approach, though it is incremental in integrating preferences into an existing framework.

The paper tackles the problem of designing reward functions in multi-objective reinforcement learning by integrating preferences to guide policy optimization, achieving competitive performance that surpasses an oracle method using ground truth rewards in benchmark and real-world tasks.

Multi-objective reinforcement learning (MORL) is a structured approach for optimizing tasks with multiple objectives. However, it often relies on pre-defined reward functions, which can be hard to design for balancing conflicting goals and may lead to oversimplification. Preferences can serve as more flexible and intuitive decision-making guidance, eliminating the need for complicated reward design. This paper introduces preference-based MORL (Pb-MORL), which formalizes the integration of preferences into the MORL framework. We theoretically prove that preferences can derive policies across the entire Pareto frontier. To guide policy optimization using preferences, our method constructs a multi-objective reward model that aligns with the given preferences. We further provide theoretical proof to show that optimizing this reward model is equivalent to training the Pareto optimal policy. Extensive experiments in benchmark multi-objective tasks, a multi-energy management task, and an autonomous driving task on a multi-line highway show that our method performs competitively, surpassing the oracle method, which uses the ground truth reward function. This highlights its potential for practical applications in complex real-world systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes