UCB-driven Utility Function Search for Multi-objective Reinforcement Learning
This work addresses the challenge of optimizing trade-offs between conflicting objectives in MORL, representing an incremental improvement in decomposition-based methods.
The paper tackles the problem of efficiently searching for promising weight vectors in multi-objective reinforcement learning to approximate Pareto fronts, resulting in consistent and strong performance on Mujoco benchmarks.
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours that trade-off between multiple, possibly conflicting, objectives. MORL based on decomposition is a family of solution methods that employ a number of utility functions to decompose the multi-objective problem into individual single-objective problems solved simultaneously in order to approximate a Pareto front of policies. We focus on the case of linear utility functions parametrised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process, with the aim of maximising the hypervolume of the resulting Pareto front. The proposed method demonstrates consistency and strong performance across various MORL baselines on Mujoco benchmark problems. The code is released in: https://github.com/SYCAMORE-1/ucb-MOPPO