LGAIJun 2, 2021

Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making

arXiv:2106.01048v315 citations
Originality Highly original
AI Analysis

It addresses a largely overlooked challenge in multi-objective decision-making, offering a practical solution for scenarios with unspecified utility functions.

The paper tackles the problem of maximizing expected utility in multi-objective reinforcement learning when user preferences are unknown, proposing a new dominance criterion called expected scalarised returns (ESR) dominance and defining the ESR set as a solution concept, with a new algorithm for learning it in bandit settings.

In many real-world scenarios, the utility of a user is derived from the single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user's preferences over objectives (also known as the utility function) are unknown or difficult to specify. In such scenarios, a set of optimal policies must be learned. However, settings where the expected utility must be maximised have been largely overlooked by the multi-objective reinforcement learning community and, as a consequence, a set of optimal solutions has yet to be defined. In this paper we address this challenge by proposing first-order stochastic dominance as a criterion to build solution sets to maximise expected utility. We also propose a new dominance criterion, known as expected scalarised returns (ESR) dominance, that extends first-order stochastic dominance to allow a set of optimal policies to be learned in practice. We then define a new solution concept called the ESR set, which is a set of policies that are ESR dominant. Finally, we define a new multi-objective distributional tabular reinforcement learning (MOT-DRL) algorithm to learn the ESR set in a multi-objective multi-armed bandit setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes