LGJul 29, 2025

Enabling Pareto-Stationarity Exploration in Multi-Objective Reinforcement Learning: A Multi-Objective Weighted-Chebyshev Actor-Critic Approach

arXiv:2507.21397v1h-index: 11CDC
Originality Incremental advance
AI Analysis

It addresses an under-explored challenge in MORL for applications requiring systematic exploration with theoretical guarantees, though it appears incremental as it integrates existing frameworks.

The paper tackles the problem of exploring Pareto-stationary solutions in multi-objective reinforcement learning with non-convex rewards, proposing the MOCHA algorithm that achieves a sample complexity of O(ε^{-2}) and outperforms baselines on a KuaiRand dataset.

In many multi-objective reinforcement learning (MORL) applications, being able to systematically explore the Pareto-stationary solutions under multiple non-convex reward objectives with theoretical finite-time sample complexity guarantee is an important and yet under-explored problem. This motivates us to take the first step and fill the important gap in MORL. Specifically, in this paper, we propose a \uline{M}ulti-\uline{O}bjective weighted-\uline{CH}ebyshev \uline{A}ctor-critic (MOCHA) algorithm for MORL, which judiciously integrates the weighted-Chebychev (WC) and actor-critic framework to enable Pareto-stationarity exploration systematically with finite-time sample complexity guarantee. Sample complexity result of MOCHA algorithm reveals an interesting dependency on $p_{\min}$ in finding an $ε$-Pareto-stationary solution, where $p_{\min}$ denotes the minimum entry of a given weight vector $\mathbf{p}$ in WC-scarlarization. By carefully choosing learning rates, the sample complexity for each exploration can be $\tilde{\mathcal{O}}(ε^{-2})$. Furthermore, simulation studies on a large KuaiRand offline dataset, show that the performance of MOCHA algorithm significantly outperforms other baseline MORL approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes