Meta-Learning for Multi-objective Reinforcement Learning
This work addresses the challenge of efficiently approximating Pareto optimal policies in multi-objective reinforcement learning, which is incremental as it builds on existing MORL and meta-learning approaches.
The paper tackles the problem of finding Pareto optimal policies in multi-objective reinforcement learning by framing it as a meta-learning problem over preferences, resulting in better approximation of solutions with improved optimality and computational efficiency, as evaluated on continuous control tasks with high degrees of freedom.
Multi-objective reinforcement learning (MORL) is the generalization of standard reinforcement learning (RL) approaches to solve sequential decision making problems that consist of several, possibly conflicting, objectives. Generally, in such formulations, there is no single optimal policy which optimizes all the objectives simultaneously, and instead, a number of policies has to be found each optimizing a preference of the objectives. In other words, the MORL is framed as a meta-learning problem, with the task distribution given by a distribution over the preferences. We demonstrate that such a formulation results in a better approximation of the Pareto optimal solutions in terms of both the optimality and the computational efficiency. We evaluated our method on obtaining Pareto optimal policies using a number of continuous control problems with high degrees of freedom.