Demonstration Guided Multi-Objective Reinforcement Learning
This addresses the difficulty of MORL for scenarios requiring trade-offs between multiple objectives, though it appears incremental as it builds on prior demonstrations and existing MORL methods.
The paper tackles the challenge of training policies from scratch in multi-objective reinforcement learning (MORL) by introducing demonstration-guided MORL (DG-MORL), which uses prior demonstrations aligned with user preferences and a self-evolving mechanism, resulting in demonstrated superiority over existing MORL algorithms and an upper bound on sample complexity.
Multi-objective reinforcement learning (MORL) is increasingly relevant due to its resemblance to real-world scenarios requiring trade-offs between multiple objectives. Catering to diverse user preferences, traditional reinforcement learning faces amplified challenges in MORL. To address the difficulty of training policies from scratch in MORL, we introduce demonstration-guided multi-objective reinforcement learning (DG-MORL). This novel approach utilizes prior demonstrations, aligns them with user preferences via corner weight support, and incorporates a self-evolving mechanism to refine suboptimal demonstrations. Our empirical studies demonstrate DG-MORL's superiority over existing MORL algorithms, establishing its robustness and efficacy, particularly under challenging conditions. We also provide an upper bound of the algorithm's sample complexity.