LGAIJan 4, 2024

Policy-regularized Offline Multi-objective Reinforcement Learning

arXiv:2401.02244v110 citationsh-index: 7AAMAS
Originality Incremental advance
AI Analysis

This work addresses offline MORL for applications requiring efficient policy learning from fixed datasets, though it appears incremental as it builds on existing single-objective methods.

The paper tackles offline multi-objective reinforcement learning by extending policy-regularized methods to handle preference-inconsistent demonstrations, proposing filtering and regularization solutions while reducing computational costs through a single network for multiple policies. Empirical results on various datasets demonstrate its capability in solving offline MORL problems.

In this paper, we aim to utilize only offline trajectory data to train a policy for multi-objective RL. We extend the offline policy-regularized method, a widely-adopted approach for single-objective offline RL problems, into the multi-objective setting in order to achieve the above goal. However, such methods face a new challenge in offline MORL settings, namely the preference-inconsistent demonstration problem. We propose two solutions to this problem: 1) filtering out preference-inconsistent demonstrations via approximating behavior preferences, and 2) adopting regularization techniques with high policy expressiveness. Moreover, we integrate the preference-conditioned scalarized update method into policy-regularized offline RL, in order to simultaneously learn a set of policies using a single policy network, thus reducing the computational cost induced by the training of a large number of individual policies for various preferences. Finally, we introduce Regularization Weight Adaptation to dynamically determine appropriate regularization weights for arbitrary target preferences during deployment. Empirical results on various multi-objective datasets demonstrate the capability of our approach in solving offline MORL problems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes