Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning
This work addresses the problem of efficiently finding optimal trade-offs in multi-objective reinforcement learning for researchers and practitioners, though it is incremental as it builds on existing decomposition approaches with new guarantees.
The paper tackles the challenge of obtaining a Pareto front of policies in multi-objective reinforcement learning by introducing Iterated Pareto Referent Optimisation (IPRO), which decomposes the problem into constrained single-objective subproblems and guarantees convergence with an upper bound on distance to undiscovered solutions, matching or outperforming existing methods in hypervolume and utility-based metrics.
An important challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies to attain optimal performance under different preferences. We introduce Iterated Pareto Referent Optimisation (IPRO), which decomposes finding the Pareto front into a sequence of constrained single-objective problems. This enables us to guarantee convergence while providing an upper bound on the distance to undiscovered Pareto optimal solutions at each step. We evaluate IPRO using utility-based metrics and its hypervolume and find that it matches or outperforms methods that require additional assumptions. By leveraging problem-specific single-objective solvers, our approach also holds promise for applications beyond multi-objective reinforcement learning, such as planning and pathfinding.