LGAIAug 16, 2022

PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

arXiv:2208.07914v365 citationsh-index: 36
Originality Highly original
AI Analysis

This addresses scalability and adaptability issues in MORL for continuous robotic tasks, offering a more efficient solution for real-world applications with dynamic preferences.

The paper tackles the problem of dynamically changing objectives and constraints in multi-objective reinforcement learning by proposing PD-MORL, a novel algorithm that trains a single universal network to cover the entire preference space, achieving up to 25% larger hypervolume and using an order of magnitude fewer parameters compared to prior methods.

Multi-objective reinforcement learning (MORL) approaches have emerged to tackle many real-world problems with multiple conflicting objectives by maximizing a joint objective function weighted by a preference vector. These approaches find fixed customized policies corresponding to preference vectors specified during training. However, the design constraints and objectives typically change dynamically in real-life scenarios. Furthermore, storing a policy for each potential preference is not scalable. Hence, obtaining a set of Pareto front solutions for the entire preference space in a given domain with a single training is critical. To this end, we propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks. The proposed approach, Preference-Driven MORL (PD-MORL), utilizes the preferences as guidance to update the network parameters. It also employs a novel parallelization approach to increase sample efficiency. We show that PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes